Máster en Data Science - Machine Learning¶

Tratamiento de Valores missing, outlier y correlaciones¶

Autor: Ramón Morillo Barrera

Dataset: Application data¶

En este notebook trabajaremos en el análisis exploratorio gráfico con el objetivo de visualizar y entender el comportamiento de las variables. Trabajaremos en el tratamiento de valores nulos o missing, outliers y estudiaremos la correlacion entre variables.

Como comentamos anteriormente, se llevará a cabo una separación estratificada en el paso de train-test split debido al desbalanceo de la variable objetivo.

Librerías¶

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.impute import KNNImputer
from termcolor import colored, cprint
import scipy.stats as ss
import warnings
import sys
from scipy.stats import chi2_contingency

pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)

Funciones¶

In [2]:
sys.path.append('../src')
import funciones_auxiliares as f_aux
sys.path.remove('../src')

# Constante
seed = 12354

Importo el dataset¶

In [3]:
df_loan = pd.read_csv('../../data_loan_status/data_preprocessing/pd_data_initial_preprocessing.csv')
df_loan.head()
Out[3]:
SK_ID_CURR COMMONAREA_AVG COMMONAREA_MEDI COMMONAREA_MODE NONLIVINGAPARTMENTS_AVG NONLIVINGAPARTMENTS_MEDI NONLIVINGAPARTMENTS_MODE FONDKAPREMONT_MODE LIVINGAPARTMENTS_MEDI LIVINGAPARTMENTS_AVG LIVINGAPARTMENTS_MODE FLOORSMIN_MODE FLOORSMIN_AVG FLOORSMIN_MEDI YEARS_BUILD_MODE YEARS_BUILD_MEDI YEARS_BUILD_AVG OWN_CAR_AGE LANDAREA_MEDI LANDAREA_AVG LANDAREA_MODE BASEMENTAREA_MODE BASEMENTAREA_AVG BASEMENTAREA_MEDI EXT_SOURCE_1 NONLIVINGAREA_AVG NONLIVINGAREA_MODE NONLIVINGAREA_MEDI ELEVATORS_MEDI ELEVATORS_AVG ELEVATORS_MODE WALLSMATERIAL_MODE APARTMENTS_AVG APARTMENTS_MODE APARTMENTS_MEDI ENTRANCES_MEDI ENTRANCES_MODE ENTRANCES_AVG LIVINGAREA_AVG LIVINGAREA_MODE LIVINGAREA_MEDI HOUSETYPE_MODE FLOORSMAX_MODE FLOORSMAX_AVG FLOORSMAX_MEDI YEARS_BEGINEXPLUATATION_MODE YEARS_BEGINEXPLUATATION_AVG YEARS_BEGINEXPLUATATION_MEDI TOTALAREA_MODE EMERGENCYSTATE_MODE OCCUPATION_TYPE EXT_SOURCE_3 AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_DAY AMT_REQ_CREDIT_BUREAU_YEAR AMT_REQ_CREDIT_BUREAU_QRT NAME_TYPE_SUITE OBS_60_CNT_SOCIAL_CIRCLE DEF_60_CNT_SOCIAL_CIRCLE OBS_30_CNT_SOCIAL_CIRCLE DEF_30_CNT_SOCIAL_CIRCLE EXT_SOURCE_2 AMT_GOODS_PRICE AMT_ANNUITY CNT_FAM_MEMBERS DAYS_LAST_PHONE_CHANGE HOUR_APPR_PROCESS_START REG_REGION_NOT_LIVE_REGION ORGANIZATION_TYPE NAME_CONTRACT_TYPE FLAG_OWN_CAR CODE_GENDER AMT_CREDIT AMT_INCOME_TOTAL CNT_CHILDREN NAME_INCOME_TYPE NAME_FAMILY_STATUS NAME_HOUSING_TYPE REGION_POPULATION_RELATIVE NAME_EDUCATION_TYPE DAYS_BIRTH DAYS_EMPLOYED DAYS_REGISTRATION DAYS_ID_PUBLISH FLAG_MOBIL FLAG_EMP_PHONE FLAG_WORK_PHONE FLAG_CONT_MOBILE TARGET FLAG_OWN_REALTY LIVE_REGION_NOT_WORK_REGION FLAG_EMAIL REGION_RATING_CLIENT REGION_RATING_CLIENT_W_CITY WEEKDAY_APPR_PROCESS_START FLAG_PHONE REG_CITY_NOT_LIVE_CITY REG_CITY_NOT_WORK_CITY LIVE_CITY_NOT_WORK_CITY REG_REGION_NOT_WORK_REGION FLAG_DOCUMENT_4 FLAG_DOCUMENT_5 FLAG_DOCUMENT_2 FLAG_DOCUMENT_3 FLAG_DOCUMENT_11 FLAG_DOCUMENT_10 FLAG_DOCUMENT_9 FLAG_DOCUMENT_8 FLAG_DOCUMENT_7 FLAG_DOCUMENT_6 FLAG_DOCUMENT_12 FLAG_DOCUMENT_13 FLAG_DOCUMENT_19 FLAG_DOCUMENT_18 FLAG_DOCUMENT_17 FLAG_DOCUMENT_16 FLAG_DOCUMENT_15 FLAG_DOCUMENT_14 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21
0 100002 0.0143 0.0144 0.0144 0.0000 0.0000 0.0 reg oper account 0.0205 0.0202 0.022 0.1250 0.1250 0.1250 0.6341 0.6243 0.6192 NaN 0.0375 0.0369 0.0377 0.0383 0.0369 0.0369 0.083037 0.0000 0.0 0.00 0.00 0.00 0.0000 Stone, brick 0.0247 0.0252 0.0250 0.0690 0.0690 0.0690 0.0190 0.0198 0.0193 block of flats 0.0833 0.0833 0.0833 0.9722 0.9722 0.9722 0.0149 No Laborers 0.139376 0.0 0.0 0.0 0.0 1.0 0.0 Unaccompanied 2.0 2.0 2.0 2.0 0.262949 351000.0 24700.5 1.0 -1134.0 10 0 Business Entity Type 3 Cash loans N M 406597.5 202500.0 0 Working Single / not married House / apartment 0.018801 Secondary / secondary special -9461 -637 -3648.0 -2120 1 1 0 1 1 Y 0 0 2 2 WEDNESDAY 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 100003 0.0605 0.0608 0.0497 0.0039 0.0039 0.0 reg oper account 0.0787 0.0773 0.079 0.3333 0.3333 0.3333 0.8040 0.7987 0.7960 NaN 0.0132 0.0130 0.0128 0.0538 0.0529 0.0529 0.311267 0.0098 0.0 0.01 0.08 0.08 0.0806 Block 0.0959 0.0924 0.0968 0.0345 0.0345 0.0345 0.0549 0.0554 0.0558 block of flats 0.2917 0.2917 0.2917 0.9851 0.9851 0.9851 0.0714 No Core staff NaN 0.0 0.0 0.0 0.0 0.0 0.0 Family 1.0 0.0 1.0 0.0 0.622246 1129500.0 35698.5 2.0 -828.0 11 0 School Cash loans N F 1293502.5 270000.0 0 State servant Married House / apartment 0.003541 Higher education -16765 -1188 -1186.0 -291 1 1 0 1 0 N 0 0 1 1 MONDAY 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 100004 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 26.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Laborers 0.729567 0.0 0.0 0.0 0.0 0.0 0.0 Unaccompanied 0.0 0.0 0.0 0.0 0.555912 135000.0 6750.0 1.0 -815.0 9 0 Government Revolving loans Y M 135000.0 67500.0 0 Working Single / not married House / apartment 0.010032 Secondary / secondary special -19046 -225 -4260.0 -2531 1 1 1 1 0 Y 0 0 2 2 MONDAY 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 100006 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Laborers NaN NaN NaN NaN NaN NaN NaN Unaccompanied 2.0 0.0 2.0 0.0 0.650442 297000.0 29686.5 2.0 -617.0 17 0 Business Entity Type 3 Cash loans N F 312682.5 135000.0 0 Working Civil marriage House / apartment 0.008019 Secondary / secondary special -19005 -3039 -9833.0 -2437 1 1 0 1 0 Y 0 0 2 2 WEDNESDAY 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 100007 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Core staff NaN 0.0 0.0 0.0 0.0 0.0 0.0 Unaccompanied 0.0 0.0 0.0 0.0 0.322738 513000.0 21865.5 1.0 -1106.0 11 0 Religion Cash loans N M 513000.0 121500.0 0 Working Single / not married House / apartment 0.028663 Secondary / secondary special -19932 -3038 -4311.0 -3458 1 1 0 1 0 Y 0 0 2 2 THURSDAY 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
In [4]:
df_loan.columns
Out[4]:
Index(['SK_ID_CURR', 'COMMONAREA_AVG', 'COMMONAREA_MEDI', 'COMMONAREA_MODE',
       'NONLIVINGAPARTMENTS_AVG', 'NONLIVINGAPARTMENTS_MEDI',
       'NONLIVINGAPARTMENTS_MODE', 'FONDKAPREMONT_MODE',
       'LIVINGAPARTMENTS_MEDI', 'LIVINGAPARTMENTS_AVG',
       ...
       'FLAG_DOCUMENT_12', 'FLAG_DOCUMENT_13', 'FLAG_DOCUMENT_19',
       'FLAG_DOCUMENT_18', 'FLAG_DOCUMENT_17', 'FLAG_DOCUMENT_16',
       'FLAG_DOCUMENT_15', 'FLAG_DOCUMENT_14', 'FLAG_DOCUMENT_20',
       'FLAG_DOCUMENT_21'],
      dtype='object', length=122)

Cambio de tipo de variables categóricas¶

Cambio el tipo de las variables object a category

In [5]:
list_var_cat, other = f_aux.dame_variables_categoricas(dataset=df_loan)
df_loan[list_var_cat] = df_loan[list_var_cat].astype("category")
list_var_continuous = list(df_loan.select_dtypes('float').columns)
df_loan[list_var_continuous] = df_loan[list_var_continuous].astype(float)
df_loan.dtypes
Out[5]:
SK_ID_CURR                         int64
COMMONAREA_AVG                   float64
COMMONAREA_MEDI                  float64
COMMONAREA_MODE                  float64
NONLIVINGAPARTMENTS_AVG          float64
NONLIVINGAPARTMENTS_MEDI         float64
NONLIVINGAPARTMENTS_MODE         float64
FONDKAPREMONT_MODE              category
LIVINGAPARTMENTS_MEDI            float64
LIVINGAPARTMENTS_AVG             float64
LIVINGAPARTMENTS_MODE            float64
FLOORSMIN_MODE                   float64
FLOORSMIN_AVG                    float64
FLOORSMIN_MEDI                   float64
YEARS_BUILD_MODE                 float64
YEARS_BUILD_MEDI                 float64
YEARS_BUILD_AVG                  float64
OWN_CAR_AGE                      float64
LANDAREA_MEDI                    float64
LANDAREA_AVG                     float64
LANDAREA_MODE                    float64
BASEMENTAREA_MODE                float64
BASEMENTAREA_AVG                 float64
BASEMENTAREA_MEDI                float64
EXT_SOURCE_1                     float64
NONLIVINGAREA_AVG                float64
NONLIVINGAREA_MODE               float64
NONLIVINGAREA_MEDI               float64
ELEVATORS_MEDI                   float64
ELEVATORS_AVG                    float64
ELEVATORS_MODE                   float64
WALLSMATERIAL_MODE              category
APARTMENTS_AVG                   float64
APARTMENTS_MODE                  float64
APARTMENTS_MEDI                  float64
ENTRANCES_MEDI                   float64
ENTRANCES_MODE                   float64
ENTRANCES_AVG                    float64
LIVINGAREA_AVG                   float64
LIVINGAREA_MODE                  float64
LIVINGAREA_MEDI                  float64
HOUSETYPE_MODE                  category
FLOORSMAX_MODE                   float64
FLOORSMAX_AVG                    float64
FLOORSMAX_MEDI                   float64
YEARS_BEGINEXPLUATATION_MODE     float64
YEARS_BEGINEXPLUATATION_AVG      float64
YEARS_BEGINEXPLUATATION_MEDI     float64
TOTALAREA_MODE                   float64
EMERGENCYSTATE_MODE             category
OCCUPATION_TYPE                 category
EXT_SOURCE_3                     float64
AMT_REQ_CREDIT_BUREAU_WEEK       float64
AMT_REQ_CREDIT_BUREAU_MON        float64
AMT_REQ_CREDIT_BUREAU_HOUR       float64
AMT_REQ_CREDIT_BUREAU_DAY        float64
AMT_REQ_CREDIT_BUREAU_YEAR       float64
AMT_REQ_CREDIT_BUREAU_QRT        float64
NAME_TYPE_SUITE                 category
OBS_60_CNT_SOCIAL_CIRCLE         float64
DEF_60_CNT_SOCIAL_CIRCLE         float64
OBS_30_CNT_SOCIAL_CIRCLE         float64
DEF_30_CNT_SOCIAL_CIRCLE         float64
EXT_SOURCE_2                     float64
AMT_GOODS_PRICE                  float64
AMT_ANNUITY                      float64
CNT_FAM_MEMBERS                  float64
DAYS_LAST_PHONE_CHANGE           float64
HOUR_APPR_PROCESS_START            int64
REG_REGION_NOT_LIVE_REGION         int64
ORGANIZATION_TYPE               category
NAME_CONTRACT_TYPE              category
FLAG_OWN_CAR                    category
CODE_GENDER                     category
AMT_CREDIT                       float64
AMT_INCOME_TOTAL                 float64
CNT_CHILDREN                       int64
NAME_INCOME_TYPE                category
NAME_FAMILY_STATUS              category
NAME_HOUSING_TYPE               category
REGION_POPULATION_RELATIVE       float64
NAME_EDUCATION_TYPE             category
DAYS_BIRTH                         int64
DAYS_EMPLOYED                      int64
DAYS_REGISTRATION                float64
DAYS_ID_PUBLISH                    int64
FLAG_MOBIL                         int64
FLAG_EMP_PHONE                     int64
FLAG_WORK_PHONE                    int64
FLAG_CONT_MOBILE                   int64
TARGET                             int64
FLAG_OWN_REALTY                 category
LIVE_REGION_NOT_WORK_REGION        int64
FLAG_EMAIL                         int64
REGION_RATING_CLIENT               int64
REGION_RATING_CLIENT_W_CITY        int64
WEEKDAY_APPR_PROCESS_START      category
FLAG_PHONE                         int64
REG_CITY_NOT_LIVE_CITY             int64
REG_CITY_NOT_WORK_CITY             int64
LIVE_CITY_NOT_WORK_CITY            int64
REG_REGION_NOT_WORK_REGION         int64
FLAG_DOCUMENT_4                    int64
FLAG_DOCUMENT_5                    int64
FLAG_DOCUMENT_2                    int64
FLAG_DOCUMENT_3                    int64
FLAG_DOCUMENT_11                   int64
FLAG_DOCUMENT_10                   int64
FLAG_DOCUMENT_9                    int64
FLAG_DOCUMENT_8                    int64
FLAG_DOCUMENT_7                    int64
FLAG_DOCUMENT_6                    int64
FLAG_DOCUMENT_12                   int64
FLAG_DOCUMENT_13                   int64
FLAG_DOCUMENT_19                   int64
FLAG_DOCUMENT_18                   int64
FLAG_DOCUMENT_17                   int64
FLAG_DOCUMENT_16                   int64
FLAG_DOCUMENT_15                   int64
FLAG_DOCUMENT_14                   int64
FLAG_DOCUMENT_20                   int64
FLAG_DOCUMENT_21                   int64
dtype: object

Separación Train-Test estratificada¶

Separaré el dataset en train y test manteniendo la proporción de la variable objetivo. Pero antes, voy a graficar la proporción de dicha variable.

In [6]:
target_count = df_loan.groupby('TARGET').agg({'TARGET':'count'}).reset_index(drop=True)
target_count['value'] = list(target_count.index)
target_count
Out[6]:
TARGET value
0 282686 0
1 24825 1
In [7]:
df_plot_loan_status = df_loan['TARGET']\
        .value_counts(normalize=True)\
        .mul(100).rename('percent').reset_index()

df_plot_loan_status_conteo = df_loan['TARGET'].value_counts(normalize=True).reset_index()
df_plot_loan_status_conteo
Out[7]:
TARGET proportion
0 0 0.919271
1 1 0.080729
In [8]:
sns.set_theme(style="whitegrid")

fig, ax = plt.subplots(figsize=(10, 6))  # Aumenta el tamaño de la gráfica

# Grafico de barras
sns.barplot(
    data=target_count, 
    x='value', 
    y='TARGET', 
    ax=ax, 
    hue='value', 
    dodge=False,  # Evita separación entre barras
    palette="pastel",  
    edgecolor="0.2"    # Añade bordes a las barras
)

# Título y etiquetas de ejes 
ax.set_title('Conteo de valores de la variable TARGET', fontsize=18, fontweight='bold', color='darkblue')
ax.set_ylabel('Count', fontsize=14, color='darkgrey')
ax.set_xlabel('Value', fontsize=14, color='darkgrey')

# Añade las etiquetas de conteo encima de las barras
for container in ax.containers:
    ax.bar_label(container, fmt='{:,.0f}', label_type="edge", padding=3, fontsize=12, color="black")
No description has been provided for this image
In [9]:
sns.set_theme(style="whitegrid")

fig, ax = plt.subplots(figsize=(10, 6))  # Aumenta el tamaño de la gráfica

# Grafico de barras
sns.barplot(
    data=df_plot_loan_status_conteo, 
    x='TARGET', 
    y='proportion', 
    ax=ax, 
    hue='TARGET', 
    dodge=False,  # Evita separación entre barras
    palette="pastel",  
    edgecolor="0.2"    # Añade bordes a las barras
)

# Título y etiquetas de ejes 
ax.set_title('Conteo de valores de la variable TARGET', fontsize=18, fontweight='bold', color='darkblue')
ax.set_ylabel('Count', fontsize=14, color='darkgrey')
ax.set_xlabel('Value', fontsize=14, color='darkgrey')

# Añade las etiquetas de conteo encima de las barras
for container in ax.containers:
    ax.bar_label(container, fmt='{:,.2%}', label_type="edge", padding=3, fontsize=12, color="black")
No description has been provided for this image

Calculé y grafiqué los valores de la variable Target para combrobar que al realizar la separación en train y test las proporciones se mantengan gracias a la estratificación.

In [10]:
from sklearn.model_selection import train_test_split
X_df_loan, X_df_loan_test, y_df_loan, y_df_loan_test = train_test_split(df_loan.drop('TARGET',axis=1), 
                                                                     df_loan['TARGET'], 
                                                                     stratify=df_loan['TARGET'], 
                                                                     test_size=0.2)
df_loan_train = pd.concat([X_df_loan, y_df_loan],axis=1)
df_loan_test = pd.concat([X_df_loan_test, y_df_loan_test],axis=1)
In [11]:
print(f'''
\033[1mTRAIN\033[0m:
{y_df_loan.value_counts(normalize=True)}

\033[1mTEST\033[0m:
{y_df_loan_test.value_counts(normalize=True)}''')
TRAIN:
TARGET
0    0.919271
1    0.080729
Name: proportion, dtype: float64

TEST:
TARGET
0    0.919272
1    0.080728
Name: proportion, dtype: float64

La separación estratificada se realizó correctamente. Observamos la misma proporción de la variable TARGET tanto en train como en test.

Visualización descriptiva de los datos¶

Vamos a observar la proporción de valores nulos en columnas y filas, además de una visualización descriptiva de la relación de las demás variables con la variable TARGET

In [12]:
pd_series_null_columns = df_loan_train.isnull().sum().sort_values(ascending=False)
pd_series_null_rows = df_loan_train.isnull().sum(axis=1).sort_values(ascending=False)
print(pd_series_null_columns.shape, pd_series_null_rows.shape)

pd_null_columnas = pd.DataFrame(pd_series_null_columns, columns=['nulos_columnas'])     
pd_null_filas = pd.DataFrame(pd_series_null_rows, columns=['nulos_filas'])  
pd_null_filas['TARGET'] = df_loan['TARGET'].copy()
pd_null_columnas['porcentaje_columnas'] = pd_null_columnas['nulos_columnas']/df_loan_train.shape[0]
pd_null_filas['porcentaje_filas']= pd_null_filas['nulos_filas']/df_loan_train.shape[1]
(122,) (246008,)
In [13]:
pd_null_columnas
Out[13]:
nulos_columnas porcentaje_columnas
COMMONAREA_AVG 171905 0.698778
COMMONAREA_MEDI 171905 0.698778
COMMONAREA_MODE 171905 0.698778
NONLIVINGAPARTMENTS_MODE 170809 0.694323
NONLIVINGAPARTMENTS_AVG 170809 0.694323
NONLIVINGAPARTMENTS_MEDI 170809 0.694323
FONDKAPREMONT_MODE 168190 0.683677
LIVINGAPARTMENTS_MEDI 168161 0.683559
LIVINGAPARTMENTS_AVG 168161 0.683559
LIVINGAPARTMENTS_MODE 168161 0.683559
FLOORSMIN_AVG 166886 0.678376
FLOORSMIN_MODE 166886 0.678376
FLOORSMIN_MEDI 166886 0.678376
YEARS_BUILD_MODE 163614 0.665076
YEARS_BUILD_AVG 163614 0.665076
YEARS_BUILD_MEDI 163614 0.665076
OWN_CAR_AGE 162412 0.660190
LANDAREA_MEDI 146058 0.593712
LANDAREA_MODE 146058 0.593712
LANDAREA_AVG 146058 0.593712
BASEMENTAREA_AVG 143933 0.585074
BASEMENTAREA_MODE 143933 0.585074
BASEMENTAREA_MEDI 143933 0.585074
EXT_SOURCE_1 138614 0.563453
NONLIVINGAREA_AVG 135696 0.551592
NONLIVINGAREA_MODE 135696 0.551592
NONLIVINGAREA_MEDI 135696 0.551592
ELEVATORS_MODE 131147 0.533101
ELEVATORS_MEDI 131147 0.533101
ELEVATORS_AVG 131147 0.533101
WALLSMATERIAL_MODE 125108 0.508553
APARTMENTS_MEDI 124908 0.507740
APARTMENTS_AVG 124908 0.507740
APARTMENTS_MODE 124908 0.507740
ENTRANCES_MODE 123895 0.503622
ENTRANCES_MEDI 123895 0.503622
ENTRANCES_AVG 123895 0.503622
LIVINGAREA_AVG 123444 0.501789
LIVINGAREA_MEDI 123444 0.501789
LIVINGAREA_MODE 123444 0.501789
HOUSETYPE_MODE 123422 0.501699
FLOORSMAX_MODE 122459 0.497785
FLOORSMAX_MEDI 122459 0.497785
FLOORSMAX_AVG 122459 0.497785
YEARS_BEGINEXPLUATATION_AVG 120009 0.487826
YEARS_BEGINEXPLUATATION_MODE 120009 0.487826
YEARS_BEGINEXPLUATATION_MEDI 120009 0.487826
TOTALAREA_MODE 118723 0.482598
EMERGENCYSTATE_MODE 116615 0.474029
OCCUPATION_TYPE 76962 0.312843
EXT_SOURCE_3 48773 0.198258
AMT_REQ_CREDIT_BUREAU_HOUR 33129 0.134666
AMT_REQ_CREDIT_BUREAU_WEEK 33129 0.134666
AMT_REQ_CREDIT_BUREAU_MON 33129 0.134666
AMT_REQ_CREDIT_BUREAU_YEAR 33129 0.134666
AMT_REQ_CREDIT_BUREAU_DAY 33129 0.134666
AMT_REQ_CREDIT_BUREAU_QRT 33129 0.134666
NAME_TYPE_SUITE 1040 0.004228
DEF_30_CNT_SOCIAL_CIRCLE 821 0.003337
OBS_60_CNT_SOCIAL_CIRCLE 821 0.003337
DEF_60_CNT_SOCIAL_CIRCLE 821 0.003337
OBS_30_CNT_SOCIAL_CIRCLE 821 0.003337
EXT_SOURCE_2 537 0.002183
AMT_GOODS_PRICE 217 0.000882
AMT_ANNUITY 11 0.000045
CNT_FAM_MEMBERS 1 0.000004
DAYS_LAST_PHONE_CHANGE 1 0.000004
SK_ID_CURR 0 0.000000
HOUR_APPR_PROCESS_START 0 0.000000
REG_REGION_NOT_LIVE_REGION 0 0.000000
ORGANIZATION_TYPE 0 0.000000
NAME_CONTRACT_TYPE 0 0.000000
FLAG_OWN_CAR 0 0.000000
CODE_GENDER 0 0.000000
AMT_CREDIT 0 0.000000
AMT_INCOME_TOTAL 0 0.000000
CNT_CHILDREN 0 0.000000
NAME_INCOME_TYPE 0 0.000000
NAME_FAMILY_STATUS 0 0.000000
NAME_HOUSING_TYPE 0 0.000000
REGION_POPULATION_RELATIVE 0 0.000000
NAME_EDUCATION_TYPE 0 0.000000
DAYS_BIRTH 0 0.000000
DAYS_EMPLOYED 0 0.000000
DAYS_REGISTRATION 0 0.000000
DAYS_ID_PUBLISH 0 0.000000
FLAG_MOBIL 0 0.000000
FLAG_EMP_PHONE 0 0.000000
FLAG_WORK_PHONE 0 0.000000
FLAG_CONT_MOBILE 0 0.000000
FLAG_OWN_REALTY 0 0.000000
LIVE_REGION_NOT_WORK_REGION 0 0.000000
FLAG_EMAIL 0 0.000000
REGION_RATING_CLIENT 0 0.000000
REGION_RATING_CLIENT_W_CITY 0 0.000000
WEEKDAY_APPR_PROCESS_START 0 0.000000
FLAG_PHONE 0 0.000000
REG_CITY_NOT_LIVE_CITY 0 0.000000
REG_CITY_NOT_WORK_CITY 0 0.000000
LIVE_CITY_NOT_WORK_CITY 0 0.000000
REG_REGION_NOT_WORK_REGION 0 0.000000
FLAG_DOCUMENT_4 0 0.000000
FLAG_DOCUMENT_5 0 0.000000
FLAG_DOCUMENT_2 0 0.000000
FLAG_DOCUMENT_3 0 0.000000
FLAG_DOCUMENT_11 0 0.000000
FLAG_DOCUMENT_10 0 0.000000
FLAG_DOCUMENT_9 0 0.000000
FLAG_DOCUMENT_8 0 0.000000
FLAG_DOCUMENT_7 0 0.000000
FLAG_DOCUMENT_6 0 0.000000
FLAG_DOCUMENT_12 0 0.000000
FLAG_DOCUMENT_13 0 0.000000
FLAG_DOCUMENT_19 0 0.000000
FLAG_DOCUMENT_18 0 0.000000
FLAG_DOCUMENT_17 0 0.000000
FLAG_DOCUMENT_16 0 0.000000
FLAG_DOCUMENT_15 0 0.000000
FLAG_DOCUMENT_14 0 0.000000
FLAG_DOCUMENT_20 0 0.000000
FLAG_DOCUMENT_21 0 0.000000
TARGET 0 0.000000
In [14]:
pd_null_filas
Out[14]:
nulos_filas TARGET porcentaje_filas
269786 61 0 0.5
69707 61 0 0.5
244833 61 0 0.5
197736 61 0 0.5
150206 61 0 0.5
... ... ... ...
134994 0 0 0.0
85268 0 0 0.0
216116 0 1 0.0
156655 0 0 0.0
245999 0 0 0.0

246008 rows × 3 columns

Vamos a visualizar la distribución de las variables numéricas y categóricas con la variable TARGET

Genero listas por tipos de variables para visualizarlas a continuación.

In [15]:
df_loan_bool, df_loan_cat, df_loan_num = f_aux.tipos_vars1(df_loan,False)
In [16]:
warnings.filterwarnings('ignore')
for i in list(df_loan_train.columns):
    if i in df_loan_num:
        f_aux.double_plot(df_loan_train, col_name=i, is_cont=True, target='TARGET')
    elif  ((i in df_loan_bool) | (i in df_loan_cat)) & (i!='TARGET'):
        f_aux.double_plot(df_loan_train, col_name=i, is_cont=False, target='TARGET')
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [17]:
df_loan_train['ORGANIZATION_TYPE'] = df_loan_train['ORGANIZATION_TYPE'].astype('category')

f_aux.double_plot(df_loan_train, col_name='ORGANIZATION_TYPE', is_cont=False, target='TARGET')
No description has been provided for this image

Análisis del gráfico¶

Cuando observamos las variables representadas visualmente, se dejan ver algunos detalles a tener en cuenta. Como el desbalanceo de la variable objetivo que ya había mencionado con anterioridad, o la cantidad de valores nulos de algunas variables que posteriormente transformaremos. Vamos a comentar el comportamiento de algunas variables en relación a nuestra variable objetivo TARGET.

  1. Los clientes que tienen coches más antiguos se suelen retrasar en el pago del préstamo.

  2. La dificultad en el pago del préstamo parece aumentar en los clientes con un score más bajo según la variable EXT_SOURCE_1, EXT_SOURCE_2 Y EXT_SOURCE_3 correspondiente a un score normalizado de una fuente de datos externa.

  3. Los clientes con materiales de madera en las paredes de sus viviendas son los más propensos a retrasarse en el pago del préstamo.

  4. Los clientes que tienen puestos de trabajo menos cualificados (low-skill laborers, drivers, waiters) presentan mayor probabilidad de retrasarse en el pago del préstamo.

  5. Conforme aumenta el número de consultas de crédito antes de la solicitud del préstamo (AMT_REQ_CREDIT_BUREAU), más aumenta la probabilidad de que se retrase en la devolución del mismo.

  6. Cuanto mayor es el tamaño de la familia del cliente más probabilidad en que se retrase en alguno de los pagos del préstamo.

  7. Se puede observar que si el cliente cambió de teléfono móvil (DAYS_LAST_PHONE_CHANGE) hace relativamente poco tiempo, aumenta la probabilidad de que pueda tener dificultades en el pago del préstamo.

  8. Los hombres son más propensos que las mujeres a tener dificultades en el pago del préstamo (CODE_GENDER).

  9. Cuanto mayor sea la cantidad de hijos que tiene el cliente, mayor será la dificultad de pago que tendrá (CNT_CHILDREN).

  10. Los clientes de baja por maternidad o desempleados son más propensos a tener dificultad en el pago del préstamo (NAME_INCOME_TYPE).

  11. Los clientes con una mayor educación son menos propensos a tener dificultades a la hora de devolver el préstamo (NAME_EDUCATION_TYPE).

  12. Parece que cuanto más jóven es el cliente (DAYS_BIRTH) tendrá más dificultades para el pago del préstamo.

  13. Los clientes que cambiaron su documento de ID poco antes de solicitar el préstamo (DAYS_ID_PUBLISH), además de si cambió su registro (DAYS_REGISTRATION) poco antes de la solicitud del préstamo, tendrá más dificultades para el pago del mismo.

  14. Cuanto mayor es el score de la región donde vive el cliente (REGION_RATING_CLIENT), mayor es la probabilidad de que tenga dificultades para el pago del préstamo.

  15. Los clientes que dieron el FLAG_DOCUMENT_2 tienen mayor probabilidad de tener dificultades en el pago del préstamo.

Tratamiento de variables continuas¶

Tratamiento de outliers¶

In [18]:
f_aux.get_deviation_of_mean_perc(df_loan_train, list_var_continuous, target='TARGET', multiplier=3)
Out[18]:
0.0 1.0 variable sum_outlier_values porcentaje_sum_null_values
0 0.954442 0.045558 COMMONAREA_AVG 1317 0.005353
1 0.953558 0.046442 COMMONAREA_MEDI 1335 0.005427
2 0.949962 0.050038 COMMONAREA_MODE 1319 0.005362
3 0.935264 0.064736 NONLIVINGAPARTMENTS_AVG 587 0.002386
4 0.931389 0.068611 NONLIVINGAPARTMENTS_MEDI 583 0.002370
5 0.925182 0.074818 NONLIVINGAPARTMENTS_MODE 548 0.002228
6 0.951567 0.048433 LIVINGAPARTMENTS_MEDI 1404 0.005707
7 0.953305 0.046695 LIVINGAPARTMENTS_AVG 1392 0.005658
8 0.950912 0.049088 LIVINGAPARTMENTS_MODE 1426 0.005797
9 0.963351 0.036649 FLOORSMIN_MODE 382 0.001553
10 0.963441 0.036559 FLOORSMIN_AVG 465 0.001890
11 0.961364 0.038636 FLOORSMIN_MEDI 440 0.001789
12 0.920969 0.079031 YEARS_BUILD_MODE 949 0.003858
13 0.921218 0.078782 YEARS_BUILD_MEDI 952 0.003870
14 0.920298 0.079702 YEARS_BUILD_AVG 941 0.003825
15 0.915503 0.084497 OWN_CAR_AGE 2722 0.011065
16 0.941418 0.058582 LANDAREA_MEDI 1707 0.006939
17 0.938360 0.061640 LANDAREA_AVG 1671 0.006792
18 0.937830 0.062170 LANDAREA_MODE 1705 0.006931
19 0.945765 0.054235 BASEMENTAREA_MODE 1641 0.006671
20 0.946727 0.053273 BASEMENTAREA_AVG 1558 0.006333
21 0.946599 0.053401 BASEMENTAREA_MEDI 1573 0.006394
22 0.944530 0.055470 NONLIVINGAREA_AVG 1947 0.007914
23 0.945399 0.054601 NONLIVINGAREA_MODE 1978 0.008040
24 0.945212 0.054788 NONLIVINGAREA_MEDI 1953 0.007939
25 0.957623 0.042377 ELEVATORS_MEDI 1935 0.007866
26 0.958355 0.041645 ELEVATORS_AVG 1945 0.007906
27 0.951776 0.048224 ELEVATORS_MODE 2675 0.010874
28 0.951858 0.048142 APARTMENTS_AVG 2368 0.009626
29 0.951209 0.048791 APARTMENTS_MODE 2398 0.009748
30 0.951300 0.048700 APARTMENTS_MEDI 2423 0.009849
31 0.937675 0.062325 ENTRANCES_MEDI 1781 0.007240
32 0.941121 0.058879 ENTRANCES_MODE 2106 0.008561
33 0.938453 0.061547 ENTRANCES_AVG 1771 0.007199
34 0.951859 0.048141 LIVINGAREA_AVG 2555 0.010386
35 0.949461 0.050539 LIVINGAREA_MODE 2691 0.010939
36 0.952177 0.047823 LIVINGAREA_MEDI 2572 0.010455
37 0.959376 0.040624 FLOORSMAX_MODE 2117 0.008605
38 0.957955 0.042045 FLOORSMAX_AVG 2093 0.008508
39 0.958106 0.041894 FLOORSMAX_MEDI 2196 0.008927
40 0.908411 0.091589 YEARS_BEGINEXPLUATATION_MODE 535 0.002175
41 0.910584 0.089416 YEARS_BEGINEXPLUATATION_AVG 548 0.002228
42 0.904854 0.095146 YEARS_BEGINEXPLUATATION_MEDI 515 0.002093
43 0.958506 0.041494 TOTALAREA_MODE 2651 0.010776
44 0.925448 0.074552 AMT_REQ_CREDIT_BUREAU_WEEK 6814 0.027698
45 0.946082 0.053918 AMT_REQ_CREDIT_BUREAU_MON 2578 0.010479
46 0.924266 0.075734 AMT_REQ_CREDIT_BUREAU_HOUR 1294 0.005260
47 0.914095 0.085905 AMT_REQ_CREDIT_BUREAU_DAY 1199 0.004874
48 0.907767 0.092233 AMT_REQ_CREDIT_BUREAU_YEAR 2678 0.010886
49 0.913961 0.086039 AMT_REQ_CREDIT_BUREAU_QRT 1848 0.007512
50 0.913829 0.086171 OBS_60_CNT_SOCIAL_CIRCLE 4816 0.019577
51 0.874327 0.125673 DEF_60_CNT_SOCIAL_CIRCLE 3159 0.012841
52 0.914217 0.085783 OBS_30_CNT_SOCIAL_CIRCLE 4966 0.020186
53 0.880065 0.119935 DEF_30_CNT_SOCIAL_CIRCLE 5503 0.022369
54 0.960299 0.039701 AMT_GOODS_PRICE 3350 0.013617
55 0.964621 0.035379 AMT_ANNUITY 2346 0.009536
56 0.901660 0.098340 CNT_FAM_MEMBERS 3193 0.012979
57 0.949119 0.050881 DAYS_LAST_PHONE_CHANGE 511 0.002077
58 0.957310 0.042690 AMT_CREDIT 2647 0.010760
59 0.942857 0.057143 AMT_INCOME_TOTAL 210 0.000854
60 0.958470 0.041530 REGION_POPULATION_RELATIVE 6718 0.027308
61 0.960848 0.039152 DAYS_REGISTRATION 613 0.002492
  • Las variables a destacar son 'AMT_CREDIT' siendo la cantidad total de dinero prestado al cliente y 'AMT_INCOME_TOTAL' siendo el ingreso total del cliente, pues estos valores pueden representar una importancia relativa en la variable 'TARGET'. Si tenemos en cuenta que el valor de nuestra variable target es que exista aproximadamente un 8% de dificultad de pago, no tendremos que preocuparnos por la cantidad de outliers que tenemos. La cantidad de outliers habrá que tenerla en cuenta pero a priori no deberían de afectar a las conclusiones finales debido a la cantidad tan reducida.

En otra instancia, destacar que los porcentajes de outliers son muy bajos prácticamente en todas las variables y no deberían de afectar significativamente a los resultados por lo que, por ahora procederé a mantenerlos.

Análisis de correlación entre las variables¶

Matriz de correlación para variables numéricas¶

In [19]:
corr = pd.concat([df_loan_train.select_dtypes('number').drop(df_loan_bool, axis=1), df_loan_train['TARGET']], axis=1).corr(method='pearson')
corr
Out[19]:
SK_ID_CURR COMMONAREA_AVG COMMONAREA_MEDI COMMONAREA_MODE NONLIVINGAPARTMENTS_AVG NONLIVINGAPARTMENTS_MEDI NONLIVINGAPARTMENTS_MODE LIVINGAPARTMENTS_MEDI LIVINGAPARTMENTS_AVG LIVINGAPARTMENTS_MODE FLOORSMIN_MODE FLOORSMIN_AVG FLOORSMIN_MEDI YEARS_BUILD_MODE YEARS_BUILD_MEDI YEARS_BUILD_AVG OWN_CAR_AGE LANDAREA_MEDI LANDAREA_AVG LANDAREA_MODE BASEMENTAREA_MODE BASEMENTAREA_AVG BASEMENTAREA_MEDI EXT_SOURCE_1 NONLIVINGAREA_AVG NONLIVINGAREA_MODE NONLIVINGAREA_MEDI ELEVATORS_MEDI ELEVATORS_AVG ELEVATORS_MODE APARTMENTS_AVG APARTMENTS_MODE APARTMENTS_MEDI ENTRANCES_MEDI ENTRANCES_MODE ENTRANCES_AVG LIVINGAREA_AVG LIVINGAREA_MODE LIVINGAREA_MEDI FLOORSMAX_MODE FLOORSMAX_AVG FLOORSMAX_MEDI YEARS_BEGINEXPLUATATION_MODE YEARS_BEGINEXPLUATATION_AVG YEARS_BEGINEXPLUATATION_MEDI TOTALAREA_MODE EXT_SOURCE_3 AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_DAY AMT_REQ_CREDIT_BUREAU_YEAR AMT_REQ_CREDIT_BUREAU_QRT OBS_60_CNT_SOCIAL_CIRCLE DEF_60_CNT_SOCIAL_CIRCLE OBS_30_CNT_SOCIAL_CIRCLE DEF_30_CNT_SOCIAL_CIRCLE EXT_SOURCE_2 AMT_GOODS_PRICE AMT_ANNUITY CNT_FAM_MEMBERS DAYS_LAST_PHONE_CHANGE HOUR_APPR_PROCESS_START AMT_CREDIT AMT_INCOME_TOTAL CNT_CHILDREN REGION_POPULATION_RELATIVE DAYS_BIRTH DAYS_EMPLOYED DAYS_REGISTRATION DAYS_ID_PUBLISH REGION_RATING_CLIENT REGION_RATING_CLIENT_W_CITY TARGET
SK_ID_CURR 1.000000 -0.000618 -0.000306 -0.000235 -0.004044 -0.004488 -0.003741 0.003474 0.003429 0.004078 0.003772 0.004766 0.004513 0.007333 0.007869 0.008118 0.000983 0.003210 0.002934 0.003035 -0.000687 -0.001218 -0.001100 -0.000098 0.002460 0.001556 0.001669 0.005666 0.005552 0.005788 0.001911 0.002158 0.002255 -0.002076 -0.002179 -0.002377 0.003940 0.004250 0.004374 0.005201 0.005760 0.005355 0.002445 0.002513 0.002298 0.003307 -0.000007 0.001299 0.000227 -0.002844 -0.001018 0.004930 -0.000050 -0.001489 0.000678 -0.001404 -0.000575 0.001123 0.000227 -0.000003 -0.002231 0.000776 0.000205 0.000214 -0.001795 -0.000688 0.001271 -0.000841 0.001274 -0.000630 -0.000887 -0.001853 -0.001741 -0.000581
COMMONAREA_AVG -0.000618 1.000000 0.995723 0.976990 0.104080 0.103623 0.101982 0.532262 0.530703 0.523296 0.287190 0.294760 0.294020 0.226436 0.229366 0.229621 -0.038274 0.254899 0.253077 0.240619 0.383025 0.401366 0.400316 0.032502 0.227503 0.215756 0.227223 0.518537 0.520095 0.501695 0.536826 0.511312 0.536078 0.322824 0.299515 0.325433 0.544066 0.519428 0.542972 0.395279 0.401736 0.400223 0.050956 0.095025 0.078857 0.550656 -0.005499 -0.009497 0.022451 0.006416 -0.000265 -0.014661 -0.010515 -0.020677 -0.014209 -0.021039 -0.012428 0.053179 0.049932 0.056695 0.000262 -0.002659 0.047662 0.049198 0.086203 -0.000503 0.168101 0.006585 -0.008967 0.024592 -0.000485 -0.120701 -0.130876 -0.021858
COMMONAREA_MEDI -0.000306 0.995723 1.000000 0.980186 0.104108 0.104648 0.103190 0.535049 0.531468 0.526520 0.285829 0.293205 0.292670 0.227598 0.230334 0.230275 -0.037959 0.257784 0.255666 0.244055 0.386034 0.402608 0.402752 0.031532 0.227796 0.217994 0.229016 0.520443 0.520169 0.503396 0.537877 0.514492 0.539034 0.325659 0.302568 0.327092 0.545263 0.522608 0.545882 0.394667 0.400655 0.399626 0.051044 0.095260 0.079089 0.550483 -0.005625 -0.009552 0.022149 0.006569 -0.000085 -0.014401 -0.010050 -0.020014 -0.013928 -0.020368 -0.012346 0.051516 0.048917 0.055852 0.000731 -0.002478 0.046151 0.048203 0.084201 -0.000145 0.163327 0.007296 -0.009276 0.025303 -0.000236 -0.117366 -0.127754 -0.021818
COMMONAREA_MODE -0.000235 0.976990 0.980186 1.000000 0.101379 0.102925 0.106578 0.524858 0.520722 0.535876 0.276881 0.275963 0.275386 0.224705 0.221103 0.221461 -0.032890 0.266620 0.263642 0.262352 0.402390 0.400002 0.402012 0.027070 0.220478 0.227433 0.224863 0.505053 0.503870 0.504118 0.527508 0.524077 0.529621 0.332168 0.321489 0.332897 0.534595 0.533735 0.536113 0.378377 0.376467 0.375441 0.049195 0.090068 0.074009 0.541181 -0.004424 -0.008405 0.019809 0.006513 0.000204 -0.013372 -0.009280 -0.016636 -0.013215 -0.016998 -0.011801 0.043665 0.041974 0.047572 0.000838 -0.000391 0.040003 0.041446 0.072656 -0.000906 0.134159 0.007584 -0.009378 0.025497 -0.000491 -0.095498 -0.107276 -0.019588
NONLIVINGAPARTMENTS_AVG -0.004044 0.104080 0.104108 0.101379 1.000000 0.988800 0.968168 0.155765 0.160672 0.142623 0.069839 0.073014 0.072611 0.071022 0.071933 0.072432 -0.027417 0.065266 0.063079 0.059149 0.091922 0.096333 0.095775 0.017335 0.217959 0.208738 0.216835 0.121778 0.121878 0.114281 0.196310 0.181238 0.192000 0.061096 0.052706 0.061623 0.136229 0.127699 0.135458 0.108526 0.113893 0.112877 0.020760 0.035872 0.032569 0.144837 0.009442 -0.003654 -0.000560 0.000469 -0.001643 0.001379 0.002805 -0.001056 -0.001319 -0.001377 0.001349 0.019233 0.014541 0.022276 0.002755 0.001123 0.014680 0.013413 0.030406 0.004179 0.024268 0.000849 -0.002721 0.035364 -0.008094 -0.018347 -0.021329 -0.003702
NONLIVINGAPARTMENTS_MEDI -0.004488 0.103623 0.104648 0.102925 0.988800 1.000000 0.979302 0.156919 0.155997 0.144498 0.068459 0.071046 0.071093 0.069883 0.070533 0.070675 -0.026958 0.062342 0.061601 0.057166 0.093564 0.096266 0.096361 0.016615 0.218406 0.211433 0.218793 0.121835 0.121147 0.115012 0.194906 0.184346 0.193856 0.062836 0.055111 0.062890 0.136504 0.129633 0.136379 0.107229 0.111649 0.111432 0.020289 0.034919 0.031826 0.144587 0.008861 -0.003997 -0.000965 0.000675 -0.001680 0.001970 0.003295 -0.000561 -0.000911 -0.000880 0.001888 0.018113 0.013412 0.021405 0.003062 0.001182 0.014174 0.012401 0.028913 0.004442 0.021699 0.000777 -0.002782 0.034240 -0.007466 -0.015891 -0.019139 -0.002904
NONLIVINGAPARTMENTS_MODE -0.003741 0.101982 0.103190 0.106578 0.968168 0.979302 1.000000 0.146722 0.145581 0.146565 0.067889 0.066070 0.066105 0.067812 0.066010 0.066057 -0.024587 0.062815 0.061931 0.062180 0.098268 0.095715 0.096317 0.015245 0.212423 0.214904 0.213502 0.116300 0.115512 0.115426 0.189573 0.186793 0.188568 0.065015 0.061913 0.064713 0.131247 0.132183 0.131377 0.101441 0.102779 0.102801 0.019254 0.032312 0.029473 0.139331 0.008848 -0.004205 -0.001375 -0.000420 -0.001305 0.002258 0.003143 -0.000231 -0.000246 -0.000553 0.003069 0.016875 0.010851 0.017211 0.002576 0.000882 0.012107 0.010076 0.025624 0.004294 0.016331 0.001163 -0.003421 0.032723 -0.007737 -0.010272 -0.014123 -0.001785
LIVINGAPARTMENTS_MEDI 0.003474 0.532262 0.535049 0.524858 0.155765 0.156919 0.146722 1.000000 0.993444 0.975784 0.433319 0.440796 0.439792 0.332742 0.333236 0.334181 -0.049922 0.425089 0.421164 0.415587 0.629270 0.650066 0.651839 0.043665 0.292987 0.276488 0.292079 0.816285 0.814531 0.801272 0.943828 0.916230 0.944156 0.567007 0.537489 0.568360 0.884652 0.858999 0.886539 0.584335 0.590479 0.588101 0.088925 0.153387 0.131092 0.847531 0.000900 -0.007432 0.032529 0.002651 0.003484 -0.013095 -0.008347 -0.028310 -0.016995 -0.028816 -0.015635 0.078604 0.061198 0.074110 -0.004163 -0.002901 0.078353 0.058731 0.105237 -0.005822 0.190426 0.013687 -0.020043 0.025284 0.000204 -0.152176 -0.176999 -0.025916
LIVINGAPARTMENTS_AVG 0.003429 0.530703 0.531468 0.520722 0.160672 0.155997 0.145581 0.993444 1.000000 0.970003 0.432860 0.441105 0.439306 0.330950 0.331908 0.333106 -0.050750 0.420759 0.417510 0.410504 0.624144 0.647704 0.646803 0.045369 0.292061 0.273629 0.289557 0.811084 0.813447 0.795752 0.945602 0.909630 0.936836 0.561134 0.531457 0.565461 0.881894 0.852971 0.879724 0.584088 0.591459 0.588124 0.088665 0.152964 0.130738 0.849248 0.001055 -0.007485 0.032595 0.002833 0.003390 -0.012730 -0.008789 -0.028427 -0.017006 -0.028928 -0.015667 0.080303 0.062989 0.076515 -0.004810 -0.003382 0.079959 0.060508 0.107432 -0.006488 0.195956 0.013299 -0.020296 0.024839 0.000710 -0.156766 -0.181184 -0.026580
LIVINGAPARTMENTS_MODE 0.004078 0.523296 0.526520 0.535876 0.142623 0.144498 0.146565 0.975784 0.970003 1.000000 0.431111 0.428039 0.427364 0.331959 0.324816 0.325796 -0.044679 0.436411 0.431860 0.438350 0.653435 0.648962 0.651692 0.038305 0.284769 0.287368 0.286799 0.800515 0.798802 0.809301 0.931941 0.939327 0.933477 0.573767 0.566713 0.575143 0.874249 0.879962 0.875901 0.573404 0.569560 0.567508 0.087476 0.148304 0.126255 0.834733 0.001906 -0.007015 0.030218 0.003853 0.003741 -0.012366 -0.008217 -0.025142 -0.017523 -0.025629 -0.016124 0.071318 0.054533 0.065992 -0.004381 -0.003171 0.072238 0.052481 0.092782 -0.006230 0.164517 0.013336 -0.019826 0.023973 0.000049 -0.129571 -0.155851 -0.024955
FLOORSMIN_MODE 0.003772 0.287190 0.285829 0.276881 0.069839 0.068459 0.067889 0.433319 0.432860 0.431111 1.000000 0.986275 0.988711 0.354088 0.352507 0.352567 -0.073863 0.152698 0.150064 0.149188 0.207001 0.220236 0.217340 0.067297 0.147103 0.136329 0.143140 0.500231 0.500414 0.496078 0.437226 0.424276 0.435556 0.034670 0.028875 0.037452 0.458830 0.444623 0.457790 0.727696 0.723655 0.724492 0.100572 0.168074 0.148876 0.446324 0.003778 -0.001291 0.035653 0.003737 0.003338 -0.008855 -0.004238 -0.035979 -0.022872 -0.036522 -0.025390 0.106986 0.076515 0.094729 -0.001186 -0.006971 0.113720 0.074611 0.130492 -0.009376 0.273877 0.000420 -0.013644 0.019499 -0.009859 -0.215123 -0.222929 -0.033119
FLOORSMIN_AVG 0.004766 0.294760 0.293205 0.275963 0.073014 0.071046 0.066070 0.440796 0.441105 0.428039 0.986275 1.000000 0.997300 0.352802 0.358981 0.359817 -0.076332 0.150093 0.147504 0.139141 0.199227 0.222760 0.219260 0.070879 0.153013 0.131155 0.146666 0.510074 0.511838 0.496145 0.445280 0.419621 0.442561 0.031725 0.016065 0.034497 0.467477 0.440933 0.465169 0.730044 0.743030 0.740669 0.101034 0.172300 0.152133 0.456486 0.002409 -0.001575 0.039477 0.003833 0.003686 -0.010269 -0.004978 -0.038168 -0.023657 -0.038671 -0.026169 0.112450 0.080338 0.100250 -0.002877 -0.007270 0.119442 0.078129 0.139013 -0.010143 0.292362 0.001133 -0.014006 0.020757 -0.009386 -0.229994 -0.236985 -0.033705
FLOORSMIN_MEDI 0.004513 0.294020 0.292670 0.275386 0.072611 0.071093 0.066105 0.439792 0.439306 0.427364 0.988711 0.997300 1.000000 0.353089 0.359400 0.359322 -0.076610 0.150355 0.147874 0.139709 0.198881 0.221369 0.218122 0.069767 0.152203 0.131145 0.146481 0.509386 0.509984 0.495428 0.443479 0.418987 0.441581 0.030663 0.015810 0.033887 0.465766 0.440049 0.464294 0.730901 0.740699 0.741322 0.100881 0.171914 0.152238 0.454403 0.002280 -0.000978 0.038721 0.003881 0.003681 -0.010540 -0.004967 -0.037967 -0.023556 -0.038457 -0.026158 0.111551 0.079628 0.098972 -0.002178 -0.007243 0.118550 0.077513 0.137605 -0.009670 0.288614 0.001302 -0.014512 0.020821 -0.009253 -0.227258 -0.234200 -0.033636
YEARS_BUILD_MODE 0.007333 0.226436 0.227598 0.224705 0.071022 0.069883 0.067812 0.332742 0.330950 0.331959 0.354088 0.352802 0.353089 1.000000 0.989634 0.989766 -0.043684 0.183206 0.181211 0.177669 0.243382 0.248353 0.247039 0.013757 0.125128 0.117844 0.124120 0.339740 0.338728 0.336509 0.337593 0.328968 0.337040 0.091311 0.085704 0.092882 0.352223 0.344393 0.352294 0.510358 0.508150 0.508380 0.302129 0.492266 0.438762 0.355397 0.014674 -0.006569 -0.004297 0.001198 0.001962 -0.020694 -0.006423 0.001401 -0.011099 0.001537 -0.010162 0.007695 0.038318 0.030641 0.041360 0.011749 -0.016409 0.033075 0.038279 0.029196 -0.064028 0.025823 -0.006851 0.163429 -0.009393 0.048298 0.040781 -0.025586
YEARS_BUILD_MEDI 0.007869 0.229366 0.230334 0.221103 0.071933 0.070533 0.066010 0.333236 0.331908 0.324816 0.352507 0.358981 0.359400 0.989634 1.000000 0.998634 -0.044703 0.179507 0.178186 0.168604 0.232960 0.246902 0.244974 0.014411 0.127607 0.112985 0.124902 0.342223 0.341850 0.333069 0.338268 0.321492 0.337094 0.085301 0.072267 0.087948 0.353487 0.337070 0.352896 0.511288 0.517014 0.517300 0.299885 0.497321 0.443892 0.357755 0.015024 -0.006244 -0.004164 0.001142 0.003460 -0.021299 -0.007438 0.000646 -0.011636 0.000839 -0.010555 0.010393 0.039981 0.032850 0.041839 0.011615 -0.014470 0.034655 0.042482 0.029595 -0.058163 0.027171 -0.007974 0.164861 -0.009253 0.043189 0.036414 -0.025933
YEARS_BUILD_AVG 0.008118 0.229621 0.230275 0.221461 0.072432 0.070675 0.066057 0.334181 0.333106 0.325796 0.352567 0.359817 0.359322 0.989766 0.998634 1.000000 -0.044935 0.179786 0.178408 0.168751 0.233523 0.247753 0.245664 0.014988 0.127629 0.112896 0.124702 0.342611 0.342693 0.333591 0.339399 0.322318 0.337949 0.086125 0.072993 0.088597 0.354490 0.337875 0.353607 0.511258 0.518305 0.517313 0.299906 0.497986 0.443345 0.359051 0.015181 -0.006283 -0.004172 0.001230 0.003057 -0.021440 -0.007304 0.000507 -0.011478 0.000709 -0.010424 0.010791 0.040326 0.033351 0.041869 0.011920 -0.014282 0.034931 0.042782 0.029646 -0.057069 0.026899 -0.007603 0.165196 -0.009454 0.042167 0.035435 -0.025685
OWN_CAR_AGE 0.000983 -0.038274 -0.037959 -0.032890 -0.027417 -0.026958 -0.024587 -0.049922 -0.050750 -0.044679 -0.073863 -0.076332 -0.076610 -0.043684 -0.044703 -0.044935 1.000000 -0.021395 -0.021384 -0.019544 -0.026777 -0.032436 -0.031287 -0.081396 -0.032484 -0.028765 -0.032077 -0.065794 -0.066436 -0.061457 -0.051160 -0.045620 -0.049992 -0.016462 -0.012329 -0.017163 -0.059801 -0.054950 -0.058606 -0.080548 -0.082869 -0.082545 0.001837 -0.000012 0.000043 -0.061077 -0.013837 0.003276 -0.022521 0.003907 -0.006480 -0.015641 -0.017527 0.005161 0.011677 0.005222 0.007421 -0.081239 -0.106258 -0.099371 -0.015176 0.002689 -0.069504 -0.096874 -0.119654 0.009539 -0.082891 0.007699 0.028075 -0.025165 0.008747 0.086297 0.087654 0.039531
LANDAREA_MEDI 0.003210 0.254899 0.257784 0.266620 0.065266 0.062342 0.062815 0.425089 0.420759 0.436411 0.152698 0.150093 0.150355 0.183206 0.179507 0.179786 -0.021395 1.000000 0.990884 0.981228 0.475542 0.471256 0.472674 0.005099 0.161373 0.162151 0.164334 0.378455 0.376991 0.380360 0.498729 0.500957 0.500756 0.511590 0.502198 0.512221 0.503788 0.505662 0.504540 0.220507 0.217760 0.217653 0.054186 0.076599 0.071351 0.493214 0.009260 0.005231 0.011826 -0.001021 0.005569 -0.011681 0.006480 -0.003551 -0.001748 -0.003813 -0.002895 0.021615 0.011375 0.005896 0.000430 -0.000237 0.014274 0.004690 -0.002390 -0.004147 -0.053101 0.004539 -0.011408 0.003442 -0.005515 0.046965 0.037945 -0.013984
LANDAREA_AVG 0.002934 0.253077 0.255666 0.263642 0.063079 0.061601 0.061931 0.421164 0.417510 0.431860 0.150064 0.147504 0.147874 0.181211 0.178186 0.178408 -0.021384 0.990884 1.000000 0.972972 0.470694 0.468224 0.469202 0.005084 0.160731 0.160244 0.162543 0.375876 0.375158 0.377513 0.495934 0.496410 0.496969 0.507562 0.497818 0.508848 0.501159 0.501364 0.501161 0.219714 0.216961 0.216819 0.053952 0.076331 0.071097 0.491015 0.009236 0.007634 0.012075 -0.001104 0.005682 -0.012393 0.006054 -0.003694 -0.001492 -0.003964 -0.002509 0.022506 0.011802 0.006374 0.000102 0.000591 0.014503 0.005175 -0.002143 -0.004457 -0.051987 0.004210 -0.011420 0.003438 -0.005355 0.045123 0.036342 -0.013539
LANDAREA_MODE 0.003035 0.240619 0.244055 0.262352 0.059149 0.057166 0.062180 0.415587 0.410504 0.438350 0.149188 0.139141 0.139709 0.177669 0.168604 0.168751 -0.019544 0.981228 0.972972 1.000000 0.484460 0.464144 0.466461 0.003333 0.154759 0.168565 0.159405 0.364891 0.362521 0.380411 0.487670 0.508529 0.490293 0.511949 0.518244 0.511956 0.491731 0.513547 0.493367 0.212257 0.202091 0.202247 0.052933 0.072452 0.067231 0.479343 0.008100 0.005646 0.010784 -0.000234 0.005862 -0.010501 0.006728 -0.002552 -0.002895 -0.002832 -0.003729 0.017290 0.007835 0.001457 0.001572 -0.000183 0.011613 0.001402 -0.004020 -0.003953 -0.061096 0.004763 -0.010425 0.004006 -0.005961 0.058796 0.048524 -0.012519
BASEMENTAREA_MODE -0.000687 0.383025 0.386034 0.402390 0.091922 0.093564 0.098268 0.629270 0.624144 0.653435 0.207001 0.199227 0.198881 0.243382 0.232960 0.233523 -0.026777 0.475542 0.470694 0.484460 1.000000 0.975291 0.978262 0.033933 0.254772 0.270229 0.259540 0.539354 0.538022 0.552293 0.660366 0.678423 0.662998 0.651956 0.653784 0.652995 0.673436 0.690212 0.672895 0.308825 0.298168 0.297787 0.059862 0.083918 0.076458 0.648240 0.004110 -0.002767 0.019158 -0.000325 0.004118 -0.011166 -0.002863 -0.010674 -0.011675 -0.011010 -0.009459 0.037158 0.037724 0.036378 -0.004981 -0.005732 0.034527 0.033595 0.011618 -0.009291 0.066314 -0.002691 -0.000176 -0.018812 -0.011839 -0.032146 -0.046738 -0.021323
BASEMENTAREA_AVG -0.001218 0.401366 0.402608 0.400002 0.096333 0.096266 0.095715 0.650066 0.647704 0.648962 0.220236 0.222760 0.221369 0.248353 0.246902 0.247753 -0.032436 0.471256 0.468224 0.464144 0.975291 1.000000 0.995783 0.039123 0.263714 0.258374 0.262911 0.561617 0.563952 0.554458 0.679918 0.667089 0.678909 0.647729 0.627240 0.651806 0.693521 0.678494 0.690215 0.328630 0.329492 0.327642 0.061477 0.089229 0.081782 0.673316 0.005423 -0.002262 0.020907 -0.001259 0.004760 -0.012728 -0.003567 -0.015154 -0.013251 -0.015466 -0.010879 0.047843 0.045509 0.046552 -0.005527 -0.006458 0.041399 0.041226 0.015454 -0.009050 0.098987 -0.002384 -0.001224 -0.020079 -0.012849 -0.061396 -0.074168 -0.023834
BASEMENTAREA_MEDI -0.001100 0.400316 0.402752 0.402012 0.095775 0.096361 0.096317 0.651839 0.646803 0.651692 0.217340 0.219260 0.218122 0.247039 0.244974 0.245664 -0.031287 0.472674 0.469202 0.466461 0.978262 0.995783 1.000000 0.038267 0.262984 0.260387 0.264363 0.561737 0.561484 0.554872 0.678683 0.668909 0.680415 0.651333 0.631156 0.652724 0.692532 0.680624 0.691541 0.325050 0.325273 0.323735 0.060960 0.088566 0.081095 0.669533 0.005378 -0.002666 0.021017 -0.001086 0.005041 -0.012201 -0.003754 -0.014443 -0.012850 -0.014743 -0.010474 0.046458 0.043617 0.044472 -0.005794 -0.007090 0.040947 0.039395 0.014711 -0.009238 0.094199 -0.002358 -0.001120 -0.020656 -0.013046 -0.057318 -0.070167 -0.023122
EXT_SOURCE_1 -0.000098 0.032502 0.031532 0.027070 0.017335 0.016615 0.015245 0.043665 0.045369 0.038305 0.067297 0.070879 0.069767 0.013757 0.014411 0.014988 -0.081396 0.005099 0.005084 0.003333 0.033933 0.039123 0.038267 1.000000 0.030153 0.024627 0.028812 0.070254 0.071731 0.066530 0.051169 0.045355 0.049499 0.019566 0.016316 0.020374 0.065437 0.060149 0.064241 0.086689 0.089897 0.088767 -0.002320 -0.001223 -0.000907 0.063785 0.185211 -0.002503 0.031976 -0.006640 -0.004104 0.005301 -0.002403 -0.026333 -0.030973 -0.026887 -0.028715 0.213917 0.174615 0.119410 -0.096102 -0.130211 0.032487 0.167599 0.023251 -0.138459 0.098941 -0.598890 0.289068 -0.178719 -0.132527 -0.113677 -0.113373 -0.155781
NONLIVINGAREA_AVG 0.002460 0.227503 0.227796 0.220478 0.217959 0.218406 0.212423 0.292987 0.292061 0.284769 0.147103 0.153013 0.152203 0.125128 0.127607 0.127629 -0.032484 0.161373 0.160731 0.154759 0.254772 0.263714 0.262984 0.030153 1.000000 0.966617 0.990679 0.279937 0.282617 0.274017 0.298349 0.285919 0.295956 0.161830 0.155001 0.164597 0.300217 0.285037 0.296622 0.248043 0.253370 0.252478 -0.008654 0.012008 0.013086 0.365713 -0.002831 -0.007740 0.012384 0.002492 0.001485 -0.009466 -0.002690 -0.017470 -0.013272 -0.017583 -0.013243 0.045519 0.044956 0.054684 0.004982 -0.004054 0.044565 0.040894 0.077089 0.003166 0.076143 0.004914 -0.014019 0.052079 0.001327 -0.082002 -0.082463 -0.012034
NONLIVINGAREA_MODE 0.001556 0.215756 0.217994 0.227433 0.208738 0.211433 0.214904 0.276488 0.273629 0.287368 0.136329 0.131155 0.131145 0.117844 0.112985 0.112896 -0.028765 0.162151 0.160244 0.168565 0.270229 0.258374 0.260387 0.024627 0.966617 1.000000 0.976036 0.264544 0.263841 0.273111 0.282916 0.292644 0.283986 0.169134 0.175061 0.169416 0.283470 0.293653 0.282900 0.231310 0.225307 0.225592 -0.004004 0.010092 0.008034 0.345283 -0.003298 -0.007128 0.009529 0.002035 0.000482 -0.007350 -0.001136 -0.013123 -0.012010 -0.013260 -0.011674 0.037709 0.039309 0.045902 0.005223 -0.004427 0.038635 0.035297 0.064064 0.002664 0.051838 0.004090 -0.012948 0.049933 0.000026 -0.059503 -0.060918 -0.010751
NONLIVINGAREA_MEDI 0.001669 0.227223 0.229016 0.224863 0.216835 0.218793 0.213502 0.292079 0.289557 0.286799 0.143140 0.146666 0.146481 0.124120 0.124902 0.124702 -0.032077 0.164334 0.162543 0.159405 0.259540 0.262911 0.264363 0.028812 0.990679 0.976036 1.000000 0.279554 0.279643 0.274803 0.296336 0.288648 0.296514 0.164696 0.159249 0.165812 0.298327 0.288660 0.296795 0.243800 0.247064 0.247169 -0.009434 0.010814 0.012117 0.360934 -0.003549 -0.007969 0.011464 0.001938 0.000551 -0.008823 -0.002270 -0.015676 -0.012892 -0.015779 -0.012713 0.043267 0.042839 0.051977 0.005025 -0.004454 0.043368 0.038741 0.073302 0.003049 0.067917 0.005578 -0.014081 0.052898 0.001627 -0.075075 -0.075988 -0.011442
ELEVATORS_MEDI 0.005666 0.518537 0.520443 0.505053 0.121778 0.121835 0.116300 0.816285 0.811084 0.800515 0.500231 0.510074 0.509386 0.339740 0.342223 0.342611 -0.065794 0.378455 0.375876 0.364891 0.539354 0.561617 0.561737 0.070254 0.279937 0.264544 0.279554 1.000000 0.995951 0.982569 0.834284 0.807492 0.836612 0.403318 0.378022 0.403711 0.865624 0.840316 0.868147 0.669392 0.676676 0.676016 0.073841 0.079690 0.078694 0.838507 0.006905 -0.003133 0.040722 0.000570 0.002988 -0.016773 -0.004685 -0.034902 -0.023556 -0.035381 -0.022742 0.113715 0.083950 0.102732 0.000133 -0.011752 0.105367 0.081052 0.039690 -0.005835 0.275174 -0.000223 -0.008678 0.000790 -0.010731 -0.221633 -0.233193 -0.035791
ELEVATORS_AVG 0.005552 0.520095 0.520169 0.503870 0.121878 0.121147 0.115512 0.814531 0.813447 0.798802 0.500414 0.511838 0.509984 0.338728 0.341850 0.342693 -0.066436 0.376991 0.375158 0.362521 0.538022 0.563952 0.561484 0.071731 0.282617 0.263841 0.279643 0.995951 1.000000 0.978454 0.836059 0.804616 0.833627 0.400296 0.374157 0.403905 0.867534 0.838036 0.865341 0.671129 0.680446 0.678042 0.073437 0.079682 0.078568 0.845008 0.006803 -0.003030 0.040755 0.000927 0.003282 -0.017063 -0.005053 -0.035805 -0.024005 -0.036295 -0.023109 0.115388 0.085197 0.104433 -0.000277 -0.011418 0.106407 0.082385 0.040491 -0.006032 0.281380 -0.000371 -0.008651 -0.000080 -0.010767 -0.227037 -0.238425 -0.036381
ELEVATORS_MODE 0.005788 0.501695 0.503396 0.504118 0.114281 0.115012 0.115426 0.801272 0.795752 0.809301 0.496078 0.496145 0.495428 0.336509 0.333069 0.333591 -0.061457 0.380360 0.377513 0.380411 0.552293 0.554458 0.554872 0.066530 0.274017 0.273111 0.274803 0.982569 0.978454 1.000000 0.821441 0.825110 0.824538 0.402745 0.401050 0.402711 0.851998 0.855616 0.855201 0.661175 0.656435 0.655700 0.077213 0.079978 0.079048 0.820160 0.006742 -0.002594 0.038436 0.000721 0.003161 -0.015946 -0.004350 -0.031987 -0.023458 -0.032493 -0.022422 0.106503 0.079855 0.096243 0.001385 -0.010413 0.099335 0.076927 0.036894 -0.005723 0.252585 -0.000107 -0.008199 0.001957 -0.010496 -0.201538 -0.213864 -0.034306
APARTMENTS_AVG 0.001911 0.536826 0.537877 0.527508 0.196310 0.194906 0.189573 0.943828 0.945602 0.931941 0.437226 0.445280 0.443479 0.337593 0.338268 0.339399 -0.051160 0.498729 0.495934 0.487670 0.660366 0.679918 0.678683 0.051169 0.298349 0.282916 0.296336 0.834284 0.836059 0.821441 1.000000 0.972824 0.995015 0.606101 0.581537 0.609724 0.914218 0.893655 0.913113 0.613988 0.618444 0.616228 0.096675 0.101424 0.100973 0.892090 0.003258 -0.003478 0.034102 0.001789 0.004611 -0.015733 -0.002850 -0.024016 -0.016403 -0.024522 -0.013851 0.090343 0.067394 0.079158 -0.010062 -0.008792 0.083651 0.063280 0.031310 -0.012330 0.206390 0.006776 -0.017006 0.013472 -0.006499 -0.152610 -0.172048 -0.031644
APARTMENTS_MODE 0.002158 0.511312 0.514492 0.524077 0.181238 0.184346 0.186793 0.916230 0.909630 0.939327 0.424276 0.419621 0.418987 0.328968 0.321492 0.322318 -0.045620 0.500957 0.496410 0.508529 0.678423 0.667089 0.668909 0.045355 0.285919 0.292644 0.288648 0.807492 0.804616 0.825110 0.972824 1.000000 0.976870 0.610962 0.614574 0.610929 0.890888 0.911286 0.894377 0.595375 0.585385 0.584368 0.101931 0.101663 0.100944 0.862033 0.002695 -0.003112 0.031627 0.002413 0.004402 -0.013849 -0.002504 -0.020334 -0.016230 -0.020814 -0.013413 0.079769 0.059877 0.068941 -0.008183 -0.007959 0.074785 0.055799 0.027137 -0.011344 0.175433 0.006669 -0.015507 0.013142 -0.006220 -0.123397 -0.143996 -0.029427
APARTMENTS_MEDI 0.002255 0.536078 0.539034 0.529621 0.192000 0.193856 0.188568 0.944156 0.936836 0.933477 0.435556 0.442561 0.441581 0.337040 0.337094 0.337949 -0.049992 0.500756 0.496969 0.490293 0.662998 0.678909 0.680415 0.049499 0.295956 0.283986 0.296514 0.836612 0.833627 0.824538 0.995015 0.976870 1.000000 0.610141 0.586601 0.610304 0.913205 0.896485 0.916740 0.612051 0.614861 0.613871 0.096835 0.101344 0.101165 0.886104 0.002998 -0.003533 0.033987 0.001865 0.004587 -0.015310 -0.002789 -0.023605 -0.016286 -0.024106 -0.013698 0.088616 0.065768 0.077002 -0.009991 -0.009078 0.082497 0.061599 0.030678 -0.012264 0.201838 0.006985 -0.016718 0.013577 -0.006452 -0.148410 -0.167831 -0.031137
ENTRANCES_MEDI -0.002076 0.322824 0.325659 0.332168 0.061096 0.062836 0.065015 0.567007 0.561134 0.573767 0.034670 0.031725 0.030663 0.091311 0.085301 0.086125 -0.016462 0.511590 0.507562 0.511949 0.651956 0.647729 0.651333 0.019566 0.161830 0.169134 0.164696 0.403318 0.400296 0.402745 0.606101 0.610962 0.610141 1.000000 0.980457 0.996902 0.615481 0.622494 0.619575 0.086672 0.083234 0.081517 0.037591 0.041857 0.040780 0.587397 0.008871 -0.000142 0.013349 -0.002721 0.006788 -0.010360 -0.000025 0.000122 -0.004201 -0.000143 -0.000596 0.031061 0.017277 0.012701 -0.003046 -0.012220 0.021172 0.013505 0.004576 -0.006975 0.033167 -0.008534 0.002773 -0.062268 -0.013221 -0.021531 -0.028446 -0.020116
ENTRANCES_MODE -0.002179 0.299515 0.302568 0.321489 0.052706 0.055111 0.061913 0.537489 0.531457 0.566713 0.028875 0.016065 0.015810 0.085704 0.072267 0.072993 -0.012329 0.502198 0.497818 0.518244 0.653784 0.627240 0.631156 0.016316 0.155001 0.175061 0.159249 0.378022 0.374157 0.401050 0.581537 0.614574 0.586601 0.980457 1.000000 0.977574 0.590724 0.623561 0.595389 0.076702 0.061508 0.060785 0.036312 0.038011 0.036758 0.559452 0.008033 -0.000065 0.011055 -0.002112 0.005709 -0.008534 0.000241 0.002181 -0.004933 0.001942 -0.001354 0.023618 0.013022 0.006746 -0.001079 -0.011140 0.016610 0.009157 0.002139 -0.005575 0.015755 -0.008220 0.003498 -0.059319 -0.012944 -0.004438 -0.012132 -0.018407
ENTRANCES_AVG -0.002377 0.325433 0.327092 0.332897 0.061623 0.062890 0.064713 0.568360 0.565461 0.575143 0.037452 0.034497 0.033887 0.092882 0.087948 0.088597 -0.017163 0.512221 0.508848 0.511956 0.652995 0.651806 0.652724 0.020374 0.164597 0.169416 0.165812 0.403711 0.403905 0.402711 0.609724 0.610929 0.610304 0.996902 0.977574 1.000000 0.619383 0.623247 0.620071 0.091075 0.087422 0.086365 0.038050 0.042632 0.041513 0.594085 0.009025 0.000257 0.013035 -0.002917 0.006717 -0.010426 -0.000196 -0.000307 -0.004412 -0.000566 -0.000895 0.032358 0.018333 0.014063 -0.002855 -0.012022 0.021492 0.014622 0.005134 -0.006867 0.036256 -0.008986 0.002734 -0.062525 -0.013075 -0.023626 -0.030790 -0.020484
LIVINGAREA_AVG 0.003940 0.544066 0.545263 0.534595 0.136229 0.136504 0.131247 0.884652 0.881894 0.874249 0.458830 0.467477 0.465766 0.352223 0.353487 0.354490 -0.059801 0.503788 0.501159 0.491731 0.673436 0.693521 0.692532 0.065437 0.300217 0.283470 0.298327 0.865624 0.867534 0.851998 0.914218 0.890888 0.913205 0.615481 0.590724 0.619383 1.000000 0.971389 0.995427 0.625755 0.630360 0.628319 0.078552 0.095967 0.092702 0.926029 0.003755 -0.004576 0.034536 0.001753 0.005269 -0.018721 -0.002850 -0.026816 -0.017731 -0.027248 -0.015859 0.096877 0.078335 0.091897 -0.003996 -0.011096 0.084724 0.073658 0.035924 -0.009387 0.214648 0.001366 -0.012905 0.007223 -0.010633 -0.164884 -0.183204 -0.035242
LIVINGAREA_MODE 0.004250 0.519428 0.522608 0.533735 0.127699 0.129633 0.132183 0.858999 0.852971 0.879962 0.444623 0.440933 0.440049 0.344393 0.337070 0.337875 -0.054950 0.505662 0.501364 0.513547 0.690212 0.678494 0.680624 0.060149 0.285037 0.293653 0.288660 0.840316 0.838036 0.855616 0.893655 0.911286 0.896485 0.622494 0.623561 0.623247 0.971389 1.000000 0.974366 0.605886 0.596739 0.595832 0.077018 0.092394 0.088537 0.899386 0.003765 -0.003888 0.031676 0.002418 0.005215 -0.017109 -0.001990 -0.021698 -0.016989 -0.022150 -0.014828 0.085227 0.070392 0.081540 -0.002208 -0.009979 0.075350 0.065696 0.031260 -0.008615 0.182047 0.001559 -0.011724 0.008007 -0.010999 -0.133665 -0.153244 -0.032972
LIVINGAREA_MEDI 0.004374 0.542972 0.545882 0.536113 0.135458 0.136379 0.131377 0.886539 0.879724 0.875901 0.457790 0.465169 0.464294 0.352294 0.352896 0.353607 -0.058606 0.504540 0.501161 0.493367 0.672895 0.690215 0.691541 0.064241 0.296622 0.282900 0.296795 0.868147 0.865341 0.855201 0.913113 0.894377 0.916740 0.619575 0.595389 0.620071 0.995427 0.974366 1.000000 0.623908 0.626875 0.626008 0.078318 0.095415 0.092450 0.920828 0.003507 -0.004357 0.034514 0.001925 0.005444 -0.018801 -0.002907 -0.026049 -0.017193 -0.026479 -0.015416 0.095325 0.077267 0.090548 -0.003830 -0.011248 0.083559 0.072571 0.035275 -0.009594 0.210470 0.001903 -0.013176 0.007687 -0.010498 -0.161042 -0.179341 -0.034857
FLOORSMAX_MODE 0.005201 0.395279 0.394667 0.378377 0.108526 0.107229 0.101441 0.584335 0.584088 0.573404 0.727696 0.730044 0.730901 0.510358 0.511288 0.511258 -0.080548 0.220507 0.219714 0.212257 0.308825 0.328630 0.325050 0.086689 0.248043 0.231310 0.243800 0.669392 0.671129 0.661175 0.613988 0.595375 0.612051 0.086672 0.076702 0.091075 0.625755 0.605886 0.623908 1.000000 0.985669 0.988201 0.109787 0.130294 0.128123 0.626085 0.003030 -0.003126 0.041317 0.001317 0.001660 -0.018166 0.000556 -0.039061 -0.030220 -0.039367 -0.030381 0.129425 0.105551 0.128378 -0.001571 -0.006165 0.113995 0.100857 0.052066 -0.009677 0.303690 0.001685 -0.014106 0.049158 -0.011584 -0.219861 -0.237230 -0.045368
FLOORSMAX_AVG 0.005760 0.401736 0.400655 0.376467 0.113893 0.111649 0.102779 0.590479 0.591459 0.569560 0.723655 0.743030 0.740699 0.508150 0.517014 0.518305 -0.082869 0.217760 0.216961 0.202091 0.298168 0.329492 0.325273 0.089897 0.253370 0.225307 0.247064 0.676676 0.680446 0.656435 0.618444 0.585385 0.614861 0.083234 0.061508 0.087422 0.630360 0.596739 0.626875 0.985669 1.000000 0.997059 0.107363 0.131014 0.129041 0.633646 0.002101 -0.003560 0.043776 0.001105 0.002235 -0.018978 -0.000114 -0.040739 -0.030484 -0.041030 -0.030619 0.135144 0.108699 0.132397 -0.002280 -0.006622 0.119406 0.103899 0.054379 -0.009643 0.322096 0.002227 -0.014993 0.049425 -0.011297 -0.235021 -0.251429 -0.046041
FLOORSMAX_MEDI 0.005355 0.400223 0.399626 0.375441 0.112877 0.111432 0.102801 0.588101 0.588124 0.567508 0.724492 0.740669 0.741322 0.508380 0.517300 0.517313 -0.082545 0.217653 0.216819 0.202247 0.297787 0.327642 0.323735 0.088767 0.252478 0.225592 0.247169 0.676016 0.678042 0.655700 0.616228 0.584368 0.613871 0.081517 0.060785 0.086365 0.628319 0.595832 0.626008 0.988201 0.997059 1.000000 0.107193 0.130742 0.129143 0.630983 0.002460 -0.003372 0.043082 0.001254 0.002082 -0.019034 0.000021 -0.040378 -0.030446 -0.040664 -0.030693 0.133912 0.108049 0.131363 -0.001868 -0.006550 0.118014 0.103290 0.053956 -0.009383 0.317838 0.002280 -0.015050 0.049661 -0.011444 -0.231637 -0.248063 -0.045861
YEARS_BEGINEXPLUATATION_MODE 0.002445 0.050956 0.051044 0.049195 0.020760 0.020289 0.019254 0.088925 0.088665 0.087476 0.100572 0.101034 0.100881 0.302129 0.299885 0.299906 0.001837 0.054186 0.053952 0.052933 0.059862 0.061477 0.060960 -0.002320 -0.008654 -0.004004 -0.009434 0.073841 0.073437 0.077213 0.096675 0.101931 0.096835 0.037591 0.036312 0.038050 0.078552 0.077018 0.078318 0.109787 0.107363 0.107193 1.000000 0.972994 0.966071 0.099119 -0.002873 0.002492 -0.000614 0.003927 -0.000412 -0.007690 0.001574 -0.000131 -0.003751 -0.000038 -0.003759 0.007867 0.006882 0.015115 0.007266 0.001918 -0.011315 0.005819 0.005204 0.006001 -0.006707 0.001740 0.008376 0.010382 -0.001100 0.004547 -0.000838 -0.009553
YEARS_BEGINEXPLUATATION_AVG 0.002513 0.095025 0.095260 0.090068 0.035872 0.034919 0.032312 0.153387 0.152964 0.148304 0.168074 0.172300 0.171914 0.492266 0.497321 0.497986 -0.000012 0.076599 0.076331 0.072452 0.083918 0.089229 0.088566 -0.001223 0.012008 0.010092 0.010814 0.079690 0.079682 0.079978 0.101424 0.101663 0.101344 0.041857 0.038011 0.042632 0.095967 0.092394 0.095415 0.130294 0.131014 0.130742 0.972994 1.000000 0.994221 0.101522 -0.003130 0.003277 -0.001142 0.003716 0.000386 -0.008031 0.001664 -0.000455 -0.005138 -0.000371 -0.005337 0.008709 0.008124 0.015545 0.007922 0.003131 -0.010619 0.007028 0.005564 0.006926 -0.006570 0.002015 0.008846 0.012817 -0.002182 0.004508 -0.000733 -0.010557
YEARS_BEGINEXPLUATATION_MEDI 0.002298 0.078857 0.079089 0.074009 0.032569 0.031826 0.029473 0.131092 0.130738 0.126255 0.148876 0.152133 0.152238 0.438762 0.443892 0.443345 0.000043 0.071351 0.071097 0.067231 0.076458 0.081782 0.081095 -0.000907 0.013086 0.008034 0.012117 0.078694 0.078568 0.079048 0.100973 0.100944 0.101165 0.040780 0.036758 0.041513 0.092702 0.088537 0.092450 0.128123 0.129041 0.129143 0.966071 0.994221 1.000000 0.100343 -0.002702 0.002431 -0.000934 0.003707 0.000400 -0.007979 0.001780 -0.000366 -0.005232 -0.000286 -0.005390 0.008639 0.007677 0.015242 0.007665 0.003533 -0.010170 0.006500 0.005571 0.006571 -0.006542 0.002051 0.008620 0.012831 -0.001754 0.004512 -0.000644 -0.010934
TOTALAREA_MODE 0.003307 0.550656 0.550483 0.541181 0.144837 0.144587 0.139331 0.847531 0.849248 0.834733 0.446324 0.456486 0.454403 0.355397 0.357755 0.359051 -0.061077 0.493214 0.491015 0.479343 0.648240 0.673316 0.669533 0.063785 0.365713 0.345283 0.360934 0.838507 0.845008 0.820160 0.892090 0.862033 0.886104 0.587397 0.559452 0.594085 0.926029 0.899386 0.920828 0.626085 0.633646 0.630983 0.099119 0.101522 0.100343 1.000000 0.004567 -0.003647 0.033840 0.002358 0.005557 -0.018790 -0.003923 -0.027016 -0.018859 -0.027462 -0.017370 0.094737 0.078645 0.092692 -0.001398 -0.008522 0.080780 0.074399 0.037922 -0.006763 0.203455 0.002688 -0.014987 0.019829 -0.010000 -0.161519 -0.178946 -0.035540
EXT_SOURCE_3 -0.000007 -0.005499 -0.005625 -0.004424 0.009442 0.008861 0.008848 0.000900 0.001055 0.001906 0.003778 0.002409 0.002280 0.014674 0.015024 0.015181 -0.013837 0.009260 0.009236 0.008100 0.004110 0.005423 0.005378 0.185211 -0.002831 -0.003298 -0.003549 0.006905 0.006803 0.006742 0.003258 0.002695 0.002998 0.008871 0.008033 0.009025 0.003755 0.003765 0.003507 0.003030 0.002101 0.002460 -0.002873 -0.003130 -0.002702 0.004567 1.000000 -0.020485 -0.008664 -0.001117 -0.008654 -0.072853 -0.023523 -0.000080 -0.034924 0.000248 -0.038208 0.109183 0.047128 0.029045 -0.029311 -0.075542 -0.040533 0.043049 -0.029240 -0.043570 -0.006362 -0.206463 0.114225 -0.106684 -0.131930 -0.012732 -0.012105 -0.180865
AMT_REQ_CREDIT_BUREAU_WEEK 0.001299 -0.009497 -0.009552 -0.008405 -0.003654 -0.003997 -0.004205 -0.007432 -0.007485 -0.007015 -0.001291 -0.001575 -0.000978 -0.006569 -0.006244 -0.006283 0.003276 0.005231 0.007634 0.005646 -0.002767 -0.002262 -0.002666 -0.002503 -0.007740 -0.007128 -0.007969 -0.003133 -0.003030 -0.002594 -0.003478 -0.003112 -0.003533 -0.000142 -0.000065 0.000257 -0.004576 -0.003888 -0.004357 -0.003126 -0.003560 -0.003372 0.002492 0.003277 0.002431 -0.003647 -0.020485 1.000000 -0.014782 0.004792 0.221089 0.016939 -0.014195 -0.001789 -0.003369 -0.001919 -0.003194 0.001740 -0.001594 0.013018 -0.002436 -0.002318 -0.004517 -0.001802 0.001770 -0.003201 -0.003104 -0.000823 0.002864 -0.001097 -0.002042 0.003056 0.002039 -0.001428
AMT_REQ_CREDIT_BUREAU_MON 0.000227 0.022451 0.022149 0.019809 -0.000560 -0.000965 -0.001375 0.032529 0.032595 0.030218 0.035653 0.039477 0.038721 -0.004297 -0.004164 -0.004172 -0.022521 0.011826 0.012075 0.010784 0.019158 0.020907 0.021017 0.031976 0.012384 0.009529 0.011464 0.040722 0.040755 0.038436 0.034102 0.031627 0.033987 0.013349 0.011055 0.013035 0.034536 0.031676 0.034514 0.041317 0.043776 0.043082 -0.000614 -0.001142 -0.000934 0.033840 -0.008664 -0.014782 1.000000 -0.000423 -0.006517 -0.005589 -0.008322 0.000739 -0.003774 0.000688 -0.000706 0.052036 0.056476 0.038745 -0.007124 -0.041114 0.036501 0.054457 0.022868 -0.009941 0.078099 0.003435 -0.035039 -0.010973 -0.008832 -0.069076 -0.067108 -0.012376
AMT_REQ_CREDIT_BUREAU_HOUR -0.002844 0.006416 0.006569 0.006513 0.000469 0.000675 -0.000420 0.002651 0.002833 0.003853 0.003737 0.003833 0.003881 0.001198 0.001142 0.001230 0.003907 -0.001021 -0.001104 -0.000234 -0.000325 -0.001259 -0.001086 -0.006640 0.002492 0.002035 0.001938 0.000570 0.000927 0.000721 0.001789 0.002413 0.001865 -0.002721 -0.002112 -0.002917 0.001753 0.002418 0.001925 0.001317 0.001105 0.001254 0.003927 0.003716 0.003707 0.002358 -0.001117 0.004792 -0.000423 1.000000 0.219818 -0.004533 -0.003131 -0.000042 -0.004294 0.000002 -0.002580 -0.003003 -0.003191 0.003610 0.000645 -0.000615 -0.017674 -0.003724 0.000290 -0.000417 -0.003025 0.003899 -0.003969 -0.001868 0.004427 0.006634 0.006760 -0.000547
AMT_REQ_CREDIT_BUREAU_DAY -0.001018 -0.000265 -0.000085 0.000204 -0.001643 -0.001680 -0.001305 0.003484 0.003390 0.003741 0.003338 0.003686 0.003681 0.001962 0.003460 0.003057 -0.006480 0.005569 0.005682 0.005862 0.004118 0.004760 0.005041 -0.004104 0.001485 0.000482 0.000551 0.002988 0.003282 0.003161 0.004611 0.004402 0.004587 0.006788 0.005709 0.006717 0.005269 0.005215 0.005444 0.001660 0.002235 0.002082 -0.000412 0.000386 0.000400 0.005557 -0.008654 0.221089 -0.006517 0.219818 1.000000 -0.003451 -0.004329 -0.002258 -0.002209 -0.002236 -0.001373 -0.000246 0.004451 0.001429 -0.000485 0.002352 0.000075 0.004057 0.002500 0.000581 0.001361 0.002007 0.001232 -0.000931 -0.002177 -0.001510 -0.001322 0.000813
AMT_REQ_CREDIT_BUREAU_YEAR 0.004930 -0.014661 -0.014401 -0.013372 0.001379 0.001970 0.002258 -0.013095 -0.012730 -0.012366 -0.008855 -0.010269 -0.010540 -0.020694 -0.021299 -0.021440 -0.015641 -0.011681 -0.012393 -0.010501 -0.011166 -0.012728 -0.012201 0.005301 -0.009466 -0.007350 -0.008823 -0.016773 -0.017063 -0.015946 -0.015733 -0.013849 -0.015310 -0.010360 -0.008534 -0.010426 -0.018721 -0.017109 -0.018801 -0.018166 -0.018978 -0.019034 -0.007690 -0.008031 -0.007979 -0.018790 -0.072853 0.016939 -0.005589 -0.004533 -0.003451 1.000000 0.073030 0.034751 0.016694 0.034265 0.019272 -0.022484 -0.051730 -0.011349 -0.028808 -0.113448 -0.030689 -0.049236 0.010620 -0.041786 0.002898 -0.072728 0.049800 -0.025366 -0.034662 0.010981 0.010322 0.018896
AMT_REQ_CREDIT_BUREAU_QRT -0.000050 -0.010515 -0.010050 -0.009280 0.002805 0.003295 0.003143 -0.008347 -0.008789 -0.008217 -0.004238 -0.004978 -0.004967 -0.006423 -0.007438 -0.007304 -0.017527 0.006480 0.006054 0.006728 -0.002863 -0.003567 -0.003754 -0.002403 -0.002690 -0.001136 -0.002270 -0.004685 -0.005053 -0.004350 -0.002850 -0.002504 -0.002789 -0.000025 0.000241 -0.000196 -0.002850 -0.001990 -0.002907 0.000556 -0.000114 0.000021 0.001574 0.001664 0.001780 -0.003923 -0.023523 -0.014195 -0.008322 -0.003131 -0.004329 0.073030 1.000000 0.004368 -0.000078 0.004627 -0.000950 -0.003633 0.015635 0.009594 -0.005218 -0.002055 -0.000416 0.015057 0.004531 -0.008286 -0.000677 -0.011702 0.014332 -0.000095 -0.007338 0.005321 0.004850 -0.002230
OBS_60_CNT_SOCIAL_CIRCLE -0.001489 -0.020677 -0.020014 -0.016636 -0.001056 -0.000561 -0.000231 -0.028310 -0.028427 -0.025142 -0.035979 -0.038168 -0.037967 0.001401 0.000646 0.000507 0.005161 -0.003551 -0.003694 -0.002552 -0.010674 -0.015154 -0.014443 -0.026333 -0.017470 -0.013123 -0.015676 -0.034902 -0.035805 -0.031987 -0.024016 -0.020334 -0.023605 0.000122 0.002181 -0.000307 -0.026816 -0.021698 -0.026049 -0.039061 -0.040739 -0.040378 -0.000131 -0.000455 -0.000366 -0.027016 -0.000080 -0.001789 0.000739 -0.000042 -0.002258 0.034751 0.004368 1.000000 0.234584 0.998362 0.308842 -0.019123 0.001816 -0.010986 0.025977 -0.015177 -0.010677 0.001722 -0.012351 0.015323 -0.010509 0.006292 0.006044 0.009425 -0.012644 0.034230 0.029777 0.009144
DEF_60_CNT_SOCIAL_CIRCLE 0.000678 -0.014209 -0.013928 -0.013215 -0.001319 -0.000911 -0.000246 -0.016995 -0.017006 -0.017523 -0.022872 -0.023657 -0.023556 -0.011099 -0.011636 -0.011478 0.011677 -0.001748 -0.001492 -0.002895 -0.011675 -0.013251 -0.012850 -0.030973 -0.013272 -0.012010 -0.012892 -0.023556 -0.024005 -0.023458 -0.016403 -0.016230 -0.016286 -0.004201 -0.004933 -0.004412 -0.017731 -0.016989 -0.017193 -0.030220 -0.030484 -0.030446 -0.003751 -0.005138 -0.005232 -0.018859 -0.034924 -0.003369 -0.003774 -0.004294 -0.002209 0.016694 -0.000078 0.234584 1.000000 0.232368 0.859132 -0.033888 -0.023002 -0.023382 -0.005347 0.002201 -0.009769 -0.022172 -0.012178 -0.003045 0.001552 0.001259 0.014949 0.004320 0.004500 0.017643 0.016739 0.029870
OBS_30_CNT_SOCIAL_CIRCLE -0.001404 -0.021039 -0.020368 -0.016998 -0.001377 -0.000880 -0.000553 -0.028816 -0.028928 -0.025629 -0.036522 -0.038671 -0.038457 0.001537 0.000839 0.000709 0.005222 -0.003813 -0.003964 -0.002832 -0.011010 -0.015466 -0.014743 -0.026887 -0.017583 -0.013260 -0.015779 -0.035381 -0.036295 -0.032493 -0.024522 -0.020814 -0.024106 -0.000143 0.001942 -0.000566 -0.027248 -0.022150 -0.026479 -0.039367 -0.041030 -0.040664 -0.000038 -0.000371 -0.000286 -0.027462 0.000248 -0.001919 0.000688 0.000002 -0.002236 0.034265 0.004627 0.998362 0.232368 1.000000 0.306435 -0.019501 0.001799 -0.011256 0.026342 -0.014661 -0.010689 0.001677 -0.012438 0.015670 -0.010980 0.006664 0.005798 0.009426 -0.012238 0.034598 0.030115 0.009272
DEF_30_CNT_SOCIAL_CIRCLE -0.000575 -0.012428 -0.012346 -0.011801 0.001349 0.001888 0.003069 -0.015635 -0.015667 -0.016124 -0.025390 -0.026169 -0.026158 -0.010162 -0.010555 -0.010424 0.007421 -0.002895 -0.002509 -0.003729 -0.009459 -0.010879 -0.010474 -0.028715 -0.013243 -0.011674 -0.012713 -0.022742 -0.023109 -0.022422 -0.013851 -0.013413 -0.013698 -0.000596 -0.001354 -0.000895 -0.015859 -0.014828 -0.015416 -0.030381 -0.030619 -0.030693 -0.003759 -0.005337 -0.005390 -0.017370 -0.038208 -0.003194 -0.000706 -0.002580 -0.001373 0.019272 -0.000950 0.308842 0.859132 0.306435 1.000000 -0.032222 -0.020983 -0.022416 -0.002822 0.000701 -0.006368 -0.019980 -0.012462 -0.001948 0.006005 -0.000538 0.017882 0.002464 0.002850 0.015480 0.014089 0.031837
EXT_SOURCE_2 0.001123 0.053179 0.051516 0.043665 0.019233 0.018113 0.016875 0.078604 0.080303 0.071318 0.106986 0.112450 0.111551 0.007695 0.010393 0.010791 -0.081239 0.021615 0.022506 0.017290 0.037158 0.047843 0.046458 0.213917 0.045519 0.037709 0.043267 0.113715 0.115388 0.106503 0.090343 0.079769 0.088616 0.031061 0.023618 0.032358 0.096877 0.085227 0.095325 0.129425 0.135144 0.133912 0.007867 0.008709 0.008639 0.094737 0.109183 0.001740 0.052036 -0.003003 -0.000246 -0.022484 -0.003633 -0.019123 -0.033888 -0.019501 -0.032222 1.000000 0.139108 0.125559 -0.001857 -0.195827 0.156600 0.131146 0.054966 -0.017545 0.198794 -0.091607 -0.019670 -0.058838 -0.050631 -0.291729 -0.287190 -0.159698
AMT_GOODS_PRICE 0.000227 0.049932 0.048917 0.041974 0.014541 0.013412 0.010851 0.061198 0.062989 0.054533 0.076515 0.080338 0.079628 0.038318 0.039981 0.040326 -0.106258 0.011375 0.011802 0.007835 0.037724 0.045509 0.043617 0.174615 0.044956 0.039309 0.042839 0.083950 0.085197 0.079855 0.067394 0.059877 0.065768 0.017277 0.013022 0.018333 0.078335 0.070392 0.077267 0.105551 0.108699 0.108049 0.006882 0.008124 0.007677 0.078645 0.047128 -0.001594 0.056476 -0.003191 0.004451 -0.051730 0.015635 0.001816 -0.023002 0.001799 -0.020983 0.139108 1.000000 0.774414 0.060464 -0.076893 0.062811 0.986998 0.146114 -0.002337 0.105018 -0.053663 -0.064092 0.012095 -0.008840 -0.104647 -0.113207 -0.039304
AMT_ANNUITY -0.000003 0.056695 0.055852 0.047572 0.022276 0.021405 0.017211 0.074110 0.076515 0.065992 0.094729 0.100250 0.098972 0.030641 0.032850 0.033351 -0.099371 0.005896 0.006374 0.001457 0.036378 0.046552 0.044472 0.119410 0.054684 0.045902 0.051977 0.102732 0.104433 0.096243 0.079158 0.068941 0.077002 0.012701 0.006746 0.014063 0.091897 0.081540 0.090548 0.128378 0.132397 0.131363 0.015115 0.015545 0.015242 0.092692 0.029045 0.013018 0.038745 0.003610 0.001429 -0.011349 0.009594 -0.010986 -0.023382 -0.011256 -0.022416 0.125559 0.774414 1.000000 0.075081 -0.064906 0.053074 0.769449 0.175849 0.020850 0.119916 0.008731 -0.103850 0.038813 0.011894 -0.129451 -0.143008 -0.012715
CNT_FAM_MEMBERS -0.002231 0.000262 0.000731 0.000838 0.002755 0.003062 0.002576 -0.004163 -0.004810 -0.004381 -0.001186 -0.002877 -0.002178 0.041360 0.041839 0.041869 -0.015176 0.000430 0.000102 0.001572 -0.004981 -0.005527 -0.005794 -0.096102 0.004982 0.005223 0.005025 0.000133 -0.000277 0.001385 -0.010062 -0.008183 -0.009991 -0.003046 -0.001079 -0.002855 -0.003996 -0.002208 -0.003830 -0.001571 -0.002280 -0.001868 0.007266 0.007922 0.007665 -0.001398 -0.029311 -0.002436 -0.007124 0.000645 -0.000485 -0.028808 -0.005218 0.025977 -0.005347 0.026342 -0.002822 -0.001857 0.060464 0.075081 1.000000 -0.027481 -0.012143 0.062528 0.015713 0.878837 -0.024273 0.278429 -0.233456 0.174431 -0.020803 0.030923 0.031620 0.010330
DAYS_LAST_PHONE_CHANGE 0.000776 -0.002659 -0.002478 -0.000391 0.001123 0.001182 0.000882 -0.002901 -0.003382 -0.003171 -0.006971 -0.007270 -0.007243 0.011749 0.011615 0.011920 0.002689 -0.000237 0.000591 -0.000183 -0.005732 -0.006458 -0.007090 -0.130211 -0.004054 -0.004427 -0.004454 -0.011752 -0.011418 -0.010413 -0.008792 -0.007959 -0.009078 -0.012220 -0.011140 -0.012022 -0.011096 -0.009979 -0.011248 -0.006165 -0.006622 -0.006550 0.001918 0.003131 0.003533 -0.008522 -0.075542 -0.002318 -0.041114 -0.000615 0.002352 -0.113448 -0.002055 -0.015177 0.002201 -0.014661 0.000701 -0.195827 -0.076893 -0.064906 -0.027481 1.000000 -0.015647 -0.074388 -0.017254 -0.006180 -0.046043 0.083957 0.023129 0.056938 0.086779 0.026558 0.025939 0.054953
HOUR_APPR_PROCESS_START 0.000205 0.047662 0.046151 0.040003 0.014680 0.014174 0.012107 0.078353 0.079959 0.072238 0.113720 0.119442 0.118550 -0.016409 -0.014470 -0.014282 -0.069504 0.014274 0.014503 0.011613 0.034527 0.041399 0.040947 0.032487 0.044565 0.038635 0.043368 0.105367 0.106407 0.099335 0.083651 0.074785 0.082497 0.021172 0.016610 0.021492 0.084724 0.075350 0.083559 0.113995 0.119406 0.118014 -0.011315 -0.010619 -0.010170 0.080780 -0.040533 -0.004517 0.036501 -0.017674 0.000075 -0.030689 -0.000416 -0.010677 -0.009769 -0.010689 -0.006368 0.156600 0.062811 0.053074 -0.012143 -0.015647 1.000000 0.053257 0.033784 -0.006909 0.171821 0.092099 -0.090384 -0.011111 0.032615 -0.285609 -0.265247 -0.022945
AMT_CREDIT 0.000214 0.049198 0.048203 0.041446 0.013413 0.012401 0.010076 0.058731 0.060508 0.052481 0.074611 0.078129 0.077513 0.033075 0.034655 0.034931 -0.096874 0.004690 0.005175 0.001402 0.033595 0.041226 0.039395 0.167599 0.040894 0.035297 0.038741 0.081052 0.082385 0.076927 0.063280 0.055799 0.061599 0.013505 0.009157 0.014622 0.073658 0.065696 0.072571 0.100857 0.103899 0.103290 0.005819 0.007028 0.006500 0.074399 0.043049 -0.001802 0.054457 -0.003724 0.004057 -0.049236 0.015057 0.001722 -0.022172 0.001677 -0.019980 0.131146 0.986998 0.769449 0.062528 -0.074388 0.053257 1.000000 0.143687 0.001776 0.101220 -0.055576 -0.066224 0.010353 -0.006176 -0.102672 -0.111988 -0.030187
AMT_INCOME_TOTAL -0.001795 0.086203 0.084201 0.072656 0.030406 0.028913 0.025624 0.105237 0.107432 0.092782 0.130492 0.139013 0.137605 0.038279 0.042482 0.042782 -0.119654 -0.002390 -0.002143 -0.004020 0.011618 0.015454 0.014711 0.023251 0.077089 0.064064 0.073302 0.039690 0.040491 0.036894 0.031310 0.027137 0.030678 0.004576 0.002139 0.005134 0.035924 0.031260 0.035275 0.052066 0.054379 0.053956 0.005204 0.005564 0.005571 0.037922 -0.029240 0.001770 0.022868 0.000290 0.002500 0.010620 0.004531 -0.012351 -0.012178 -0.012438 -0.012462 0.054966 0.146114 0.175849 0.015713 -0.017254 0.033784 0.143687 1.000000 0.012452 0.068597 0.025544 -0.058891 0.025475 0.008070 -0.078886 -0.084670 -0.002481
CNT_CHILDREN -0.000688 -0.000503 -0.000145 -0.000906 0.004179 0.004442 0.004294 -0.005822 -0.006488 -0.006230 -0.009376 -0.010143 -0.009670 0.029196 0.029595 0.029646 0.009539 -0.004147 -0.004457 -0.003953 -0.009291 -0.009050 -0.009238 -0.138459 0.003166 0.002664 0.003049 -0.005835 -0.006032 -0.005723 -0.012330 -0.011344 -0.012264 -0.006975 -0.005575 -0.006867 -0.009387 -0.008615 -0.009594 -0.009677 -0.009643 -0.009383 0.006001 0.006926 0.006571 -0.006763 -0.043570 -0.003201 -0.009941 -0.000417 0.000581 -0.041786 -0.008286 0.015323 -0.003045 0.015670 -0.001948 -0.017545 -0.002337 0.020850 0.878837 -0.006180 -0.006909 0.001776 0.012452 1.000000 -0.025826 0.331623 -0.240468 0.183940 -0.028503 0.025528 0.024614 0.019552
REGION_POPULATION_RELATIVE 0.001271 0.168101 0.163327 0.134159 0.024268 0.021699 0.016331 0.190426 0.195956 0.164517 0.273877 0.292362 0.288614 -0.064028 -0.058163 -0.057069 -0.082891 -0.053101 -0.051987 -0.061096 0.066314 0.098987 0.094199 0.098941 0.076143 0.051838 0.067917 0.275174 0.281380 0.252585 0.206390 0.175433 0.201838 0.033167 0.015755 0.036256 0.214648 0.182047 0.210470 0.303690 0.322096 0.317838 -0.006707 -0.006570 -0.006542 0.203455 -0.006362 -0.003104 0.078099 -0.003025 0.001361 0.002898 -0.000677 -0.010509 0.001552 -0.010980 0.006005 0.198794 0.105018 0.119916 -0.024273 -0.046043 0.171821 0.101220 0.068597 -0.025826 1.000000 -0.029078 -0.003825 -0.052062 -0.003950 -0.532986 -0.531728 -0.037004
DAYS_BIRTH -0.000841 0.006585 0.007296 0.007584 0.000849 0.000777 0.001163 0.013687 0.013299 0.013336 0.000420 0.001133 0.001302 0.025823 0.027171 0.026899 0.007699 0.004539 0.004210 0.004763 -0.002691 -0.002384 -0.002358 -0.598890 0.004914 0.004090 0.005578 -0.000223 -0.000371 -0.000107 0.006776 0.006669 0.006985 -0.008534 -0.008220 -0.008986 0.001366 0.001559 0.001903 0.001685 0.002227 0.002280 0.001740 0.002015 0.002051 0.002688 -0.206463 -0.000823 0.003435 0.003899 0.002007 -0.072728 -0.011702 0.006292 0.001259 0.006664 -0.000538 -0.091607 -0.053663 0.008731 0.278429 0.083957 0.092099 -0.055576 0.025544 0.331623 -0.029078 1.000000 -0.615504 0.331472 0.272287 0.008738 0.007549 0.078418
DAYS_EMPLOYED 0.001274 -0.008967 -0.009276 -0.009378 -0.002721 -0.002782 -0.003421 -0.020043 -0.020296 -0.019826 -0.013644 -0.014006 -0.014512 -0.006851 -0.007974 -0.007603 0.028075 -0.011408 -0.011420 -0.010425 -0.000176 -0.001224 -0.001120 0.289068 -0.014019 -0.012948 -0.014081 -0.008678 -0.008651 -0.008199 -0.017006 -0.015507 -0.016718 0.002773 0.003498 0.002734 -0.012905 -0.011724 -0.013176 -0.014106 -0.014993 -0.015050 0.008376 0.008846 0.008620 -0.014987 0.114225 0.002864 -0.035039 -0.003969 0.001232 0.049800 0.014332 0.006044 0.014949 0.005798 0.017882 -0.019670 -0.064092 -0.103850 -0.233456 0.023129 -0.090384 -0.066224 -0.058891 -0.240468 -0.003825 -0.615504 1.000000 -0.210273 -0.272791 0.032585 0.034407 -0.045064
DAYS_REGISTRATION -0.000630 0.024592 0.025303 0.025497 0.035364 0.034240 0.032723 0.025284 0.024839 0.023973 0.019499 0.020757 0.020821 0.163429 0.164861 0.165196 -0.025165 0.003442 0.003438 0.004006 -0.018812 -0.020079 -0.020656 -0.178719 0.052079 0.049933 0.052898 0.000790 -0.000080 0.001957 0.013472 0.013142 0.013577 -0.062268 -0.059319 -0.062525 0.007223 0.008007 0.007687 0.049158 0.049425 0.049661 0.010382 0.012817 0.012831 0.019829 -0.106684 -0.001097 -0.010973 -0.001868 -0.000931 -0.025366 -0.000095 0.009425 0.004320 0.009426 0.002464 -0.058838 0.012095 0.038813 0.174431 0.056938 -0.011111 0.010353 0.025475 0.183940 -0.052062 0.331472 -0.210273 1.000000 0.101934 0.079297 0.072988 0.040217
DAYS_ID_PUBLISH -0.000887 -0.000485 -0.000236 -0.000491 -0.008094 -0.007466 -0.007737 0.000204 0.000710 0.000049 -0.009859 -0.009386 -0.009253 -0.009393 -0.009253 -0.009454 0.008747 -0.005515 -0.005355 -0.005961 -0.011839 -0.012849 -0.013046 -0.132527 0.001327 0.000026 0.001627 -0.010731 -0.010767 -0.010496 -0.006499 -0.006220 -0.006452 -0.013221 -0.012944 -0.013075 -0.010633 -0.010999 -0.010498 -0.011584 -0.011297 -0.011444 -0.001100 -0.002182 -0.001754 -0.010000 -0.131930 -0.002042 -0.008832 0.004427 -0.002177 -0.034662 -0.007338 -0.012644 0.004500 -0.012238 0.002850 -0.050631 -0.008840 0.011894 -0.020803 0.086779 0.032615 -0.006176 0.008070 -0.028503 -0.003950 0.272287 -0.272791 0.101934 1.000000 -0.005385 -0.008018 0.051695
REGION_RATING_CLIENT -0.001853 -0.120701 -0.117366 -0.095498 -0.018347 -0.015891 -0.010272 -0.152176 -0.156766 -0.129571 -0.215123 -0.229994 -0.227258 0.048298 0.043189 0.042167 0.086297 0.046965 0.045123 0.058796 -0.032146 -0.061396 -0.057318 -0.113677 -0.082002 -0.059503 -0.075075 -0.221633 -0.227037 -0.201538 -0.152610 -0.123397 -0.148410 -0.021531 -0.004438 -0.023626 -0.164884 -0.133665 -0.161042 -0.219861 -0.235021 -0.231637 0.004547 0.004508 0.004512 -0.161519 -0.012732 0.003056 -0.069076 0.006634 -0.001510 0.010981 0.005321 0.034230 0.017643 0.034598 0.015480 -0.291729 -0.104647 -0.129451 0.030923 0.026558 -0.285609 -0.102672 -0.078886 0.025528 -0.532986 0.008738 0.032585 0.079297 -0.005385 1.000000 0.950316 0.058141
REGION_RATING_CLIENT_W_CITY -0.001741 -0.130876 -0.127754 -0.107276 -0.021329 -0.019139 -0.014123 -0.176999 -0.181184 -0.155851 -0.222929 -0.236985 -0.234200 0.040781 0.036414 0.035435 0.087654 0.037945 0.036342 0.048524 -0.046738 -0.074168 -0.070167 -0.113373 -0.082463 -0.060918 -0.075988 -0.233193 -0.238425 -0.213864 -0.172048 -0.143996 -0.167831 -0.028446 -0.012132 -0.030790 -0.183204 -0.153244 -0.179341 -0.237230 -0.251429 -0.248063 -0.000838 -0.000733 -0.000644 -0.178946 -0.012105 0.002039 -0.067108 0.006760 -0.001322 0.010322 0.004850 0.029777 0.016739 0.030115 0.014089 -0.287190 -0.113207 -0.143008 0.031620 0.025939 -0.265247 -0.111988 -0.084670 0.024614 -0.531728 0.007549 0.034407 0.072988 -0.008018 0.950316 1.000000 0.059963
TARGET -0.000581 -0.021858 -0.021818 -0.019588 -0.003702 -0.002904 -0.001785 -0.025916 -0.026580 -0.024955 -0.033119 -0.033705 -0.033636 -0.025586 -0.025933 -0.025685 0.039531 -0.013984 -0.013539 -0.012519 -0.021323 -0.023834 -0.023122 -0.155781 -0.012034 -0.010751 -0.011442 -0.035791 -0.036381 -0.034306 -0.031644 -0.029427 -0.031137 -0.020116 -0.018407 -0.020484 -0.035242 -0.032972 -0.034857 -0.045368 -0.046041 -0.045861 -0.009553 -0.010557 -0.010934 -0.035540 -0.180865 -0.001428 -0.012376 -0.000547 0.000813 0.018896 -0.002230 0.009144 0.029870 0.009272 0.031837 -0.159698 -0.039304 -0.012715 0.010330 0.054953 -0.022945 -0.030187 -0.002481 0.019552 -0.037004 0.078418 -0.045064 0.040217 0.051695 0.058141 0.059963 1.000000
In [20]:
f_aux.get_corr_matrix(dataset = df_loan_train[list_var_continuous], 
                metodo='pearson', size_figure=[10,8])
No description has been provided for this image
Out[20]:
0

De las correlaciones observadas me gustaría destacar dos de ellas:

  1. Observamos como AMT_CREDIT y AMT_ANNUITY tienen una correlación positiva del 77%, es decir, si aumenta la cantidad de dinero prestado al cliente, aumenta la anualidad de la solicitud anterior.

  2. AMT_CREDIT Y AMT_GOOD_PRICES presentan una correlación lineal positiva del 99%, es decir, cuanto mayor es cantidad prestada al cliente, mayor es el valor de sus bienes para los que se le ha concedido el préstamo. Esto es algo lógico.

Además de estas dos correlaciones, la variable 'TARGET' no está altamente correlacionada y no hay variables que expliquen el comportamiento de nuestra variable objetivo.

In [21]:
corr.loc['TARGET'].sort_values(ascending=False)
Out[21]:
TARGET                          1.000000
DAYS_BIRTH                      0.078418
REGION_RATING_CLIENT_W_CITY     0.059963
REGION_RATING_CLIENT            0.058141
DAYS_LAST_PHONE_CHANGE          0.054953
DAYS_ID_PUBLISH                 0.051695
DAYS_REGISTRATION               0.040217
OWN_CAR_AGE                     0.039531
DEF_30_CNT_SOCIAL_CIRCLE        0.031837
DEF_60_CNT_SOCIAL_CIRCLE        0.029870
CNT_CHILDREN                    0.019552
AMT_REQ_CREDIT_BUREAU_YEAR      0.018896
CNT_FAM_MEMBERS                 0.010330
OBS_30_CNT_SOCIAL_CIRCLE        0.009272
OBS_60_CNT_SOCIAL_CIRCLE        0.009144
AMT_REQ_CREDIT_BUREAU_DAY       0.000813
AMT_REQ_CREDIT_BUREAU_HOUR     -0.000547
SK_ID_CURR                     -0.000581
AMT_REQ_CREDIT_BUREAU_WEEK     -0.001428
NONLIVINGAPARTMENTS_MODE       -0.001785
AMT_REQ_CREDIT_BUREAU_QRT      -0.002230
AMT_INCOME_TOTAL               -0.002481
NONLIVINGAPARTMENTS_MEDI       -0.002904
NONLIVINGAPARTMENTS_AVG        -0.003702
YEARS_BEGINEXPLUATATION_MODE   -0.009553
YEARS_BEGINEXPLUATATION_AVG    -0.010557
NONLIVINGAREA_MODE             -0.010751
YEARS_BEGINEXPLUATATION_MEDI   -0.010934
NONLIVINGAREA_MEDI             -0.011442
NONLIVINGAREA_AVG              -0.012034
AMT_REQ_CREDIT_BUREAU_MON      -0.012376
LANDAREA_MODE                  -0.012519
AMT_ANNUITY                    -0.012715
LANDAREA_AVG                   -0.013539
LANDAREA_MEDI                  -0.013984
ENTRANCES_MODE                 -0.018407
COMMONAREA_MODE                -0.019588
ENTRANCES_MEDI                 -0.020116
ENTRANCES_AVG                  -0.020484
BASEMENTAREA_MODE              -0.021323
COMMONAREA_MEDI                -0.021818
COMMONAREA_AVG                 -0.021858
HOUR_APPR_PROCESS_START        -0.022945
BASEMENTAREA_MEDI              -0.023122
BASEMENTAREA_AVG               -0.023834
LIVINGAPARTMENTS_MODE          -0.024955
YEARS_BUILD_MODE               -0.025586
YEARS_BUILD_AVG                -0.025685
LIVINGAPARTMENTS_MEDI          -0.025916
YEARS_BUILD_MEDI               -0.025933
LIVINGAPARTMENTS_AVG           -0.026580
APARTMENTS_MODE                -0.029427
AMT_CREDIT                     -0.030187
APARTMENTS_MEDI                -0.031137
APARTMENTS_AVG                 -0.031644
LIVINGAREA_MODE                -0.032972
FLOORSMIN_MODE                 -0.033119
FLOORSMIN_MEDI                 -0.033636
FLOORSMIN_AVG                  -0.033705
ELEVATORS_MODE                 -0.034306
LIVINGAREA_MEDI                -0.034857
LIVINGAREA_AVG                 -0.035242
TOTALAREA_MODE                 -0.035540
ELEVATORS_MEDI                 -0.035791
ELEVATORS_AVG                  -0.036381
REGION_POPULATION_RELATIVE     -0.037004
AMT_GOODS_PRICE                -0.039304
DAYS_EMPLOYED                  -0.045064
FLOORSMAX_MODE                 -0.045368
FLOORSMAX_MEDI                 -0.045861
FLOORSMAX_AVG                  -0.046041
EXT_SOURCE_1                   -0.155781
EXT_SOURCE_2                   -0.159698
EXT_SOURCE_3                   -0.180865
Name: TARGET, dtype: float64

Ninguna variable explica de una manera muy grande a la variable Target, algo que parece normal en un problema tan complejo como es la detección de dificultad en pago de préstamos.

Tratamiento de valores nulos¶

El tratamiento de valores nulos depende del contexto en el que estemos trabajando, la naturaleza de los datos y el impacto que los valores ausentes pueden tener en tu análisis o modelo de machine learning. En general hay varias opciones a la hora de imputar nuestros valores nulos:

  1. Imputar los valores numéricos mediante la media si nuestras variables siguen una distribución normal o mediante la mediana cuando presenten valores atípicos. Imputar un valor fijo o predeterminado, o utilizar un algoritmo de imputación avanzada (KNN) que predice los valores ausentes en función de los valores de otras columnas.

  2. Imputar los valores categóricos mediante la moda cuando las variables presentan valores dominantes, asignar un valor fijo como pudiera ser 'Desconocido'.

En mi caso, al no tener mucho contexto de las variables, decidiré imputar los valores nulos de las variables categóricas por un valor fijo 'Desconocido' ya que realmente no conocemos la naturaleza de esos valores nulos. Prefiero no imputar por moda, ya que en algunas variables categóricas realmente no observamos un valor predominante sobre los demás, por lo que podríamos distorsionar la distribución de dichas variables.

En el caso de las numéricas, optaré por imputar la mediana ya que la mayoría de las variables numéricas no siguen una distribución normal y a pesar de no presentar un gran porcentaje de valores atípicos la mediana no se ve afectada por valores extremos, a diferencia de la media. Además, los modelos de machine learning suelen ser sensibles a valores extremos. Usar la mediana reduce la posibilidad de que los valores imputados introduzcan ruido o sesgo no deseado.

En el caso de las variables booleanas, variables que toman el valor 0 o 1, si que optaré por imputar su moda, ya que no tiene sentido imputar por su mediana si verdaderamente su distribución toman dos únicos valores.

In [22]:
list_cat_vars, other = f_aux.dame_variables_categoricas(dataset=df_loan_train)

# Nos aseguramos de que las columnas categóricas permitan la categoría 'Desconocido'
for col in list_cat_vars:
    if pd.api.types.is_categorical_dtype(df_loan_train[col]):
        # Agregar 'Desconocido' como categoría si no existe
        df_loan_train[col] = df_loan_train[col].cat.add_categories(['Desconocido'])

# Imputar valores nulos con 'Desconocido'
df_loan_train[list_cat_vars] = df_loan_train[list_cat_vars].fillna(value='Desconocido')


df_loan_train[list_cat_vars]
Out[22]:
FONDKAPREMONT_MODE WALLSMATERIAL_MODE HOUSETYPE_MODE EMERGENCYSTATE_MODE OCCUPATION_TYPE NAME_TYPE_SUITE ORGANIZATION_TYPE NAME_CONTRACT_TYPE FLAG_OWN_CAR CODE_GENDER NAME_INCOME_TYPE NAME_FAMILY_STATUS NAME_HOUSING_TYPE NAME_EDUCATION_TYPE FLAG_OWN_REALTY WEEKDAY_APPR_PROCESS_START
238851 Desconocido Desconocido Desconocido Desconocido Laborers Unaccompanied Business Entity Type 1 Revolving loans N M Working Single / not married House / apartment Secondary / secondary special N MONDAY
181603 Desconocido Stone, brick block of flats No Sales staff Unaccompanied Self-employed Cash loans Y F Working Married House / apartment Secondary / secondary special N WEDNESDAY
63661 reg oper account Stone, brick block of flats No Sales staff Family Business Entity Type 3 Cash loans N F Commercial associate Married House / apartment Secondary / secondary special Y FRIDAY
122457 reg oper account Stone, brick block of flats No Sales staff Unaccompanied Industry: type 6 Cash loans N F Working Civil marriage House / apartment Secondary / secondary special Y MONDAY
70875 Desconocido Block block of flats No Drivers Unaccompanied Other Cash loans Y M Commercial associate Married House / apartment Secondary / secondary special Y TUESDAY
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
216116 reg oper account Stone, brick block of flats No Laborers Unaccompanied Self-employed Cash loans Y M Working Married House / apartment Secondary / secondary special N FRIDAY
168796 Desconocido Stone, brick block of flats No Sales staff Family Business Entity Type 3 Cash loans N F Commercial associate Married House / apartment Higher education Y THURSDAY
241375 Desconocido Desconocido Desconocido Desconocido Core staff Unaccompanied Business Entity Type 1 Cash loans N F Commercial associate Single / not married House / apartment Secondary / secondary special N SATURDAY
297753 reg oper account Panel block of flats No Waiters/barmen staff Unaccompanied Self-employed Cash loans N F Working Married House / apartment Incomplete higher N TUESDAY
108462 Desconocido Desconocido Desconocido Desconocido Laborers Unaccompanied Business Entity Type 2 Cash loans N F Working Married House / apartment Secondary / secondary special Y THURSDAY

246008 rows × 16 columns

No observamos valores nulos en nuestras columnas que presentan valores booleanos, aunque si tuvieramos presencia de ellos y nos surgiera la necesidad de imputar la moda en lugar de esos valores, podríamos utilizar el bucle descrito en el siguiente código.

In [23]:
df_loan_train[df_loan_bool].isnull().sum()

# for col in df_loan_train.select_dtypes(include=['bool']).columns:
    # Calcular la moda de la columna
#    moda = df_loan_train[col].mode()[0]
    # Sustituir los valores nulos con la moda
#    df_loan_train[col] = df_loan_train[col].fillna(moda)
Out[23]:
REG_REGION_NOT_LIVE_REGION     0
FLAG_MOBIL                     0
FLAG_EMP_PHONE                 0
FLAG_WORK_PHONE                0
FLAG_CONT_MOBILE               0
TARGET                         0
LIVE_REGION_NOT_WORK_REGION    0
FLAG_EMAIL                     0
FLAG_PHONE                     0
REG_CITY_NOT_LIVE_CITY         0
REG_CITY_NOT_WORK_CITY         0
LIVE_CITY_NOT_WORK_CITY        0
REG_REGION_NOT_WORK_REGION     0
FLAG_DOCUMENT_4                0
FLAG_DOCUMENT_5                0
FLAG_DOCUMENT_2                0
FLAG_DOCUMENT_3                0
FLAG_DOCUMENT_11               0
FLAG_DOCUMENT_10               0
FLAG_DOCUMENT_9                0
FLAG_DOCUMENT_8                0
FLAG_DOCUMENT_7                0
FLAG_DOCUMENT_6                0
FLAG_DOCUMENT_12               0
FLAG_DOCUMENT_13               0
FLAG_DOCUMENT_19               0
FLAG_DOCUMENT_18               0
FLAG_DOCUMENT_17               0
FLAG_DOCUMENT_16               0
FLAG_DOCUMENT_15               0
FLAG_DOCUMENT_14               0
FLAG_DOCUMENT_20               0
FLAG_DOCUMENT_21               0
dtype: int64
In [24]:
# Imputar valores nulos en columnas numéricas con la mediana
for col in df_loan_train.select_dtypes(include=['number']).columns:
    # Calcular la mediana de la columna
    mediana = df_loan_train[col].median()
    # Sustituir los valores nulos con la mediana
    df_loan_train[col] = df_loan_train[col].fillna(mediana)

df_loan_train[df_loan_num].head(10)
Out[24]:
SK_ID_CURR COMMONAREA_AVG COMMONAREA_MEDI COMMONAREA_MODE NONLIVINGAPARTMENTS_AVG NONLIVINGAPARTMENTS_MEDI NONLIVINGAPARTMENTS_MODE LIVINGAPARTMENTS_MEDI LIVINGAPARTMENTS_AVG LIVINGAPARTMENTS_MODE FLOORSMIN_AVG YEARS_BUILD_MODE YEARS_BUILD_MEDI YEARS_BUILD_AVG OWN_CAR_AGE LANDAREA_MEDI LANDAREA_AVG LANDAREA_MODE BASEMENTAREA_MODE BASEMENTAREA_AVG BASEMENTAREA_MEDI EXT_SOURCE_1 NONLIVINGAREA_AVG NONLIVINGAREA_MODE NONLIVINGAREA_MEDI ELEVATORS_AVG APARTMENTS_AVG APARTMENTS_MODE APARTMENTS_MEDI ENTRANCES_AVG LIVINGAREA_AVG LIVINGAREA_MODE LIVINGAREA_MEDI FLOORSMAX_AVG FLOORSMAX_MEDI YEARS_BEGINEXPLUATATION_MODE YEARS_BEGINEXPLUATATION_AVG YEARS_BEGINEXPLUATATION_MEDI TOTALAREA_MODE EXT_SOURCE_3 EXT_SOURCE_2 AMT_GOODS_PRICE AMT_ANNUITY DAYS_LAST_PHONE_CHANGE ORGANIZATION_TYPE AMT_CREDIT AMT_INCOME_TOTAL REGION_POPULATION_RELATIVE DAYS_BIRTH DAYS_EMPLOYED DAYS_REGISTRATION DAYS_ID_PUBLISH
238851 376683 0.0211 0.0209 0.0191 0.0000 0.0000 0.0000 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 9.0 0.0488 0.0483 0.0459 0.0747 0.0764 0.0759 0.505819 0.0036 0.0011 0.0030 0.00 0.0876 0.0840 0.0874 0.1379 0.0745 0.0731 0.0749 0.1667 0.1667 0.9816 0.9816 0.9816 0.0688 0.204423 0.409389 180000.0 9000.0 -1237.0 Business Entity Type 1 180000.0 135000.0 0.008474 -9935 -869 -3440.0 -2546
181603 310487 0.0211 0.0209 0.0191 0.0000 0.0000 0.0000 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 2.0 0.0488 0.0483 0.0459 0.0747 0.0764 0.0759 0.505819 0.0036 0.0011 0.0030 0.00 0.0814 0.0830 0.0822 0.2069 0.0919 0.0957 0.0935 0.1667 0.1667 0.9926 0.9925 0.9925 0.0723 0.722393 0.728032 679500.0 28404.0 -1692.0 Self-employed 787131.0 135000.0 0.010147 -10078 -1289 -608.0 -1233
63661 173827 0.0116 0.0116 0.0117 0.0000 0.0000 0.0000 0.0599 0.0588 0.0643 0.2083 0.7060 0.6981 0.6940 9.0 0.0644 0.0633 0.0647 0.0552 0.0532 0.0532 0.233131 0.0000 0.0000 0.0000 0.00 0.0722 0.0735 0.0729 0.1724 0.0525 0.0547 0.0535 0.1667 0.1667 0.9777 0.9776 0.9776 0.0568 0.535276 0.392192 229500.0 27454.5 -777.0 Business Entity Type 3 253737.0 189000.0 0.046220 -9425 -435 -4201.0 -98
122457 241978 0.0064 0.0065 0.0065 0.0039 0.0039 0.0039 0.0547 0.0538 0.0588 0.2083 0.6864 0.6780 0.6736 9.0 0.0333 0.0328 0.0335 0.0845 0.0815 0.0815 0.889098 0.0509 0.0539 0.0520 0.00 0.0670 0.0683 0.0677 0.1379 0.0508 0.0529 0.0517 0.1667 0.1667 0.9762 0.9762 0.9762 0.0546 0.481249 0.568924 477000.0 17775.0 -2639.0 Industry: type 6 552555.0 90000.0 0.031329 -20494 -2304 -10741.0 -4051
70875 182206 0.0211 0.0209 0.0191 0.0000 0.0000 0.0000 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 5.0 0.0488 0.0483 0.0459 0.0747 0.0764 0.0759 0.505819 0.0000 0.0000 0.0000 0.00 0.0753 0.0588 0.0760 0.1724 0.0745 0.0731 0.0749 0.1667 0.1667 0.9836 0.9836 0.9836 0.0408 0.275000 0.294987 477000.0 40797.0 -129.0 Other 558855.0 225000.0 0.031329 -14160 -289 -6104.0 -4666
233090 369981 0.0211 0.0209 0.0191 0.0000 0.0000 0.0000 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 9.0 0.0488 0.0483 0.0459 0.0747 0.0764 0.0759 0.505819 0.0036 0.0011 0.0030 0.00 0.0876 0.0840 0.0874 0.1379 0.0977 0.1018 0.0995 0.1667 0.1667 0.9901 0.9901 0.9901 0.0768 0.616122 0.285898 229500.0 25227.0 -2148.0 Security 253737.0 67500.0 0.008068 -19518 -1189 -1167.0 -2764
148840 272567 0.0211 0.0209 0.0191 0.0000 0.0000 0.0000 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 9.0 0.0488 0.0483 0.0459 0.0747 0.0764 0.0759 0.505819 0.0036 0.0011 0.0030 0.00 0.0876 0.0840 0.0874 0.1379 0.0745 0.0731 0.0749 0.1667 0.1667 0.9816 0.9816 0.9816 0.0688 0.586740 0.654621 360000.0 15790.5 -721.0 XNA 436032.0 90000.0 0.003122 -23128 365243 -7790.0 -748
176528 304561 0.0000 0.0000 0.0000 0.0039 0.0039 0.0039 0.0676 0.0664 0.0725 0.2083 0.6929 0.6847 0.6804 9.0 0.0709 0.0697 0.0713 0.0699 0.0673 0.0673 0.505819 0.0029 0.0030 0.0029 0.00 0.0825 0.0840 0.0833 0.1379 0.0692 0.0721 0.0705 0.1667 0.1667 0.9767 0.9767 0.9767 0.0723 0.691021 0.673752 1350000.0 47443.5 0.0 XNA 1506816.0 135000.0 0.018801 -20207 365243 -9765.0 -3675
201528 333611 0.0211 0.0209 0.0191 0.0000 0.0000 0.0000 0.0761 0.0756 0.0771 0.2083 0.7648 0.7585 0.7552 9.0 0.0488 0.0483 0.0459 0.0747 0.0764 0.0759 0.505819 0.0036 0.0011 0.0030 0.00 0.0876 0.0840 0.0874 0.1379 0.0745 0.0731 0.0749 0.1667 0.1667 0.9816 0.9816 0.9816 0.0688 0.586740 0.636360 166500.0 7686.0 -929.0 School 210456.0 135000.0 0.018029 -18133 -2278 -8441.0 -1678
213939 347914 0.0332 0.0334 0.0335 0.0000 0.0000 0.0000 0.0761 0.0756 0.0771 0.0417 0.8236 0.8189 0.8164 9.0 0.0196 0.0193 0.0198 0.0358 0.0345 0.0345 0.505819 0.1120 0.1186 0.1144 0.04 0.0557 0.0567 0.0562 0.0345 0.0369 0.0385 0.0376 0.3333 0.3333 0.9866 0.9866 0.9866 0.0472 0.379100 0.003770 450000.0 31261.5 -550.0 Self-employed 640080.0 112500.0 0.008230 -20770 -1077 -8671.0 -3281
In [25]:
f_aux.get_percent_null_values_target(df_loan_train, [i for i in list_var_continuous], target='TARGET')
No existen variables con valores nulos
Out[25]:

Nos aseguramos que todas las imputaciones de valores nulos se han realizado de manera exitosa.

Matriz de correlación para variables categóricas: Cramers V matrix¶

Debido a que no podemos ver la correlación de las variables categóricas con el estadístico de Pearson, vamos a acercarnos lo máximo posible con el estadístico de V Cramers. Podremos observar la correlación de nuestras variables categóricas.

Si bien aunque nuestras variables booleanas que toman valores de 0 o 1 son numéricas, su verdadero origen e interpretación es categórica, ya que si toma valor de 0 pertenece a una categoría distinta de si tomara valor de 1. Por tanto, trataremos a estas como tal y realizaremos su correlación según la V de Cramers.

In [26]:
df_cat_bool = pd.concat([df_loan_train[df_loan_cat], df_loan_train[df_loan_bool]], axis=1)
df_cat_bool.columns.values
Out[26]:
array(['FONDKAPREMONT_MODE', 'FLOORSMIN_MODE', 'FLOORSMIN_MEDI',
       'ELEVATORS_MEDI', 'ELEVATORS_MODE', 'WALLSMATERIAL_MODE',
       'ENTRANCES_MEDI', 'ENTRANCES_MODE', 'HOUSETYPE_MODE',
       'FLOORSMAX_MODE', 'EMERGENCYSTATE_MODE', 'OCCUPATION_TYPE',
       'AMT_REQ_CREDIT_BUREAU_WEEK', 'AMT_REQ_CREDIT_BUREAU_MON',
       'AMT_REQ_CREDIT_BUREAU_HOUR', 'AMT_REQ_CREDIT_BUREAU_DAY',
       'AMT_REQ_CREDIT_BUREAU_YEAR', 'AMT_REQ_CREDIT_BUREAU_QRT',
       'NAME_TYPE_SUITE', 'OBS_60_CNT_SOCIAL_CIRCLE',
       'DEF_60_CNT_SOCIAL_CIRCLE', 'OBS_30_CNT_SOCIAL_CIRCLE',
       'DEF_30_CNT_SOCIAL_CIRCLE', 'CNT_FAM_MEMBERS',
       'HOUR_APPR_PROCESS_START', 'NAME_CONTRACT_TYPE', 'FLAG_OWN_CAR',
       'CODE_GENDER', 'CNT_CHILDREN', 'NAME_INCOME_TYPE',
       'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE', 'NAME_EDUCATION_TYPE',
       'FLAG_OWN_REALTY', 'REGION_RATING_CLIENT',
       'REGION_RATING_CLIENT_W_CITY', 'WEEKDAY_APPR_PROCESS_START',
       'REG_REGION_NOT_LIVE_REGION', 'FLAG_MOBIL', 'FLAG_EMP_PHONE',
       'FLAG_WORK_PHONE', 'FLAG_CONT_MOBILE', 'TARGET',
       'LIVE_REGION_NOT_WORK_REGION', 'FLAG_EMAIL', 'FLAG_PHONE',
       'REG_CITY_NOT_LIVE_CITY', 'REG_CITY_NOT_WORK_CITY',
       'LIVE_CITY_NOT_WORK_CITY', 'REG_REGION_NOT_WORK_REGION',
       'FLAG_DOCUMENT_4', 'FLAG_DOCUMENT_5', 'FLAG_DOCUMENT_2',
       'FLAG_DOCUMENT_3', 'FLAG_DOCUMENT_11', 'FLAG_DOCUMENT_10',
       'FLAG_DOCUMENT_9', 'FLAG_DOCUMENT_8', 'FLAG_DOCUMENT_7',
       'FLAG_DOCUMENT_6', 'FLAG_DOCUMENT_12', 'FLAG_DOCUMENT_13',
       'FLAG_DOCUMENT_19', 'FLAG_DOCUMENT_18', 'FLAG_DOCUMENT_17',
       'FLAG_DOCUMENT_16', 'FLAG_DOCUMENT_15', 'FLAG_DOCUMENT_14',
       'FLAG_DOCUMENT_20', 'FLAG_DOCUMENT_21'], dtype=object)
In [27]:
confusion_matrix = pd.crosstab(df_loan_train["TARGET"], df_loan_train["NAME_CONTRACT_TYPE"])
print(confusion_matrix)
f_aux.cramers_v(confusion_matrix.values)
NAME_CONTRACT_TYPE  Cash loans  Revolving loans
TARGET                                         
0                       204044            22104
1                        18586             1274
Out[27]:
np.float64(0.031114763938304826)
In [28]:
confusion_matrix = pd.crosstab(df_loan_train["TARGET"], df_loan_train["TARGET"])
f_aux.cramers_v(confusion_matrix.values)
Out[28]:
np.float64(0.9999726127135284)
In [29]:
corr_cats = f_aux.corr_cat(df=df_cat_bool, target='TARGET' ,target_transform=True)
corr_cats
Out[29]:
FONDKAPREMONT_MODE WALLSMATERIAL_MODE HOUSETYPE_MODE EMERGENCYSTATE_MODE OCCUPATION_TYPE NAME_TYPE_SUITE NAME_CONTRACT_TYPE FLAG_OWN_CAR CODE_GENDER NAME_INCOME_TYPE NAME_FAMILY_STATUS NAME_HOUSING_TYPE NAME_EDUCATION_TYPE FLAG_OWN_REALTY WEEKDAY_APPR_PROCESS_START TARGET
FONDKAPREMONT_MODE 1.000000 0.350329 0.395533 0.461823 0.031637 0.016696 0.023575 0.015057 0.012622 0.028740 0.024369 0.031720 0.044419 0.017970 0.005678 0.031948
WALLSMATERIAL_MODE 0.350329 1.000000 0.559573 0.690289 0.032516 0.014038 0.029114 0.034645 0.020289 0.031086 0.033476 0.043840 0.063345 0.029822 0.003958 0.044376
HOUSETYPE_MODE 0.395533 0.559573 1.000000 0.669327 0.046282 0.019799 0.028316 0.033259 0.019805 0.043217 0.043550 0.045783 0.068319 0.023046 0.002006 0.040940
EMERGENCYSTATE_MODE 0.461823 0.690289 0.669327 1.000000 0.057169 0.025698 0.028490 0.035851 0.021454 0.054398 0.054786 0.060894 0.086814 0.022301 0.005163 0.042496
OCCUPATION_TYPE 0.031637 0.032516 0.046282 0.057169 1.000000 0.020719 0.061912 0.256621 0.358815 0.289628 0.090566 0.044058 0.188272 0.049178 0.018078 0.081136
NAME_TYPE_SUITE 0.016696 0.014038 0.019799 0.025698 0.020719 1.000000 0.029978 0.042797 0.044520 0.020343 0.061945 0.019291 0.025099 0.073248 0.016833 0.009675
NAME_CONTRACT_TYPE 0.023575 0.029114 0.028316 0.028490 0.061912 0.029978 0.999976 0.005527 0.014329 0.061772 0.047759 0.027405 0.067877 0.068083 0.015118 0.031115
FLAG_OWN_CAR 0.015057 0.034645 0.033259 0.035851 0.256621 0.042797 0.005527 0.999991 0.345930 0.156379 0.167302 0.039645 0.097645 0.000509 0.003590 0.021341
CODE_GENDER 0.012622 0.020289 0.019805 0.021454 0.358815 0.044520 0.014329 0.345930 1.000000 0.119663 0.118149 0.047076 0.018635 0.043935 0.004841 0.055814
NAME_INCOME_TYPE 0.028740 0.031086 0.043217 0.054398 0.289628 0.020343 0.061772 0.156379 0.119663 1.000000 0.112233 0.054499 0.103975 0.072251 0.012171 0.063505
NAME_FAMILY_STATUS 0.024369 0.033476 0.043550 0.054786 0.090566 0.061945 0.047759 0.167302 0.118149 0.112233 1.000000 0.067538 0.052131 0.051105 0.003140 0.039752
NAME_HOUSING_TYPE 0.031720 0.043840 0.045783 0.060894 0.044058 0.019291 0.027405 0.039645 0.047076 0.054499 0.067538 1.000000 0.042252 0.226447 0.003215 0.037953
NAME_EDUCATION_TYPE 0.044419 0.063345 0.068319 0.086814 0.188272 0.025099 0.067877 0.097645 0.018635 0.103975 0.052131 0.042252 1.000000 0.030644 0.005185 0.057593
FLAG_OWN_REALTY 0.017970 0.029822 0.023046 0.022301 0.049178 0.073248 0.068083 0.000509 0.043935 0.072251 0.051105 0.226447 0.030644 0.999990 0.024692 0.006420
WEEKDAY_APPR_PROCESS_START 0.005678 0.003958 0.002006 0.005163 0.018078 0.016833 0.015118 0.003590 0.004841 0.012171 0.003140 0.003215 0.005185 0.024692 1.000000 0.004558
TARGET 0.031948 0.044376 0.040940 0.042496 0.081136 0.009675 0.031115 0.021341 0.055814 0.063505 0.039752 0.037953 0.057593 0.006420 0.004558 0.999973
In [30]:
plt.figure(figsize=(15,8))
sns.heatmap(corr_cats, annot=True, fmt='.3f', cmap='YlGnBu')
plt.title('Cramers V Matrix', fontdict={'size':'17'})
plt.show()
No description has been provided for this image
In [31]:
warnings.filterwarnings("ignore")

corr_bool = f_aux.corr_cat_boolean(df_loan_train[df_loan_bool])
corr_bool
Out[31]:
REG_REGION_NOT_LIVE_REGION FLAG_MOBIL FLAG_EMP_PHONE FLAG_WORK_PHONE FLAG_CONT_MOBILE TARGET LIVE_REGION_NOT_WORK_REGION FLAG_EMAIL FLAG_PHONE REG_CITY_NOT_LIVE_CITY REG_CITY_NOT_WORK_CITY LIVE_CITY_NOT_WORK_CITY REG_REGION_NOT_WORK_REGION FLAG_DOCUMENT_4 FLAG_DOCUMENT_5 FLAG_DOCUMENT_2 FLAG_DOCUMENT_3 FLAG_DOCUMENT_11 FLAG_DOCUMENT_10 FLAG_DOCUMENT_9 FLAG_DOCUMENT_8 FLAG_DOCUMENT_7 FLAG_DOCUMENT_6 FLAG_DOCUMENT_12 FLAG_DOCUMENT_13 FLAG_DOCUMENT_19 FLAG_DOCUMENT_18 FLAG_DOCUMENT_17 FLAG_DOCUMENT_16 FLAG_DOCUMENT_15 FLAG_DOCUMENT_14 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21
REG_REGION_NOT_LIVE_REGION 0.999866 0.000000 0.037046 0.064987 0.000000 0.004242 0.090931 0.018803 0.002026 0.339547 0.142506 0.010829 0.452122 0.000000 0.011142 0.000000 0.033288 0.105901 0.001342 0.017142 0.023536 0.000000 0.023963 0.000000 0.002580 0.000000 0.008837 0.000000 0.005940 0.000000 0.003106 0.000610 0.001759
FLAG_MOBIL 0.000000 0.499995 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.002335 0.000000 0.000000 0.000000 0.000000 0.000000 0.010736 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
FLAG_EMP_PHONE 0.037046 0.000000 0.999986 0.233843 0.011449 0.046043 0.096618 0.062399 0.014936 0.092256 0.255917 0.218957 0.108618 0.000000 0.018527 0.001192 0.248955 0.029219 0.000000 0.023234 0.122020 0.000000 0.597988 0.000000 0.026118 0.009385 0.040904 0.005971 0.042980 0.014717 0.023547 0.009698 0.007439
FLAG_WORK_PHONE 0.064987 0.000000 0.233843 0.999987 0.021613 0.027957 0.042017 0.012190 0.293571 0.045825 0.121108 0.110478 0.068964 0.003075 0.036079 0.000000 0.061083 0.123001 0.000598 0.007848 0.020747 0.000000 0.138504 0.000000 0.000000 0.011717 0.030950 0.000000 0.004875 0.008916 0.001766 0.000000 0.000000
FLAG_CONT_MOBILE 0.000000 0.000000 0.011449 0.021613 0.998932 0.000000 0.001495 0.006958 0.007272 0.000000 0.003522 0.003211 0.000000 0.000000 0.004577 0.000000 0.006392 0.000000 0.000000 0.008224 0.021809 0.001773 0.009241 0.000000 0.060600 0.004141 0.040618 0.014093 0.030104 0.012868 0.075213 0.000000 0.007145
TARGET 0.004242 0.000000 0.046043 0.027957 0.000000 0.999973 0.001437 0.002944 0.022757 0.046275 0.051897 0.032252 0.006048 0.000000 0.000000 0.005469 0.044135 0.003865 0.000000 0.003826 0.008008 0.000000 0.028538 0.000000 0.011950 0.000000 0.007022 0.002479 0.010595 0.005639 0.008764 0.000000 0.001196
LIVE_REGION_NOT_WORK_REGION 0.090931 0.000000 0.096618 0.042017 0.001495 0.001437 0.999948 0.024498 0.005080 0.023564 0.186336 0.237686 0.858512 0.000000 0.014603 0.000000 0.011904 0.005963 0.000000 0.015080 0.061096 0.000000 0.059169 0.000000 0.015899 0.000000 0.003540 0.000000 0.003389 0.004232 0.013542 0.000000 0.000000
FLAG_EMAIL 0.018803 0.000000 0.062399 0.012190 0.006958 0.002944 0.024498 0.999962 0.014745 0.014999 0.003383 0.003754 0.029056 0.000000 0.000000 0.001125 0.011470 0.004342 0.000000 0.009699 0.030388 0.000000 0.041923 0.000000 0.002992 0.000000 0.008508 0.000000 0.012151 0.002711 0.000000 0.000000 0.000000
FLAG_PHONE 0.002026 0.000000 0.014936 0.293571 0.007272 0.022757 0.005080 0.014745 0.999990 0.048395 0.045366 0.022972 0.003957 0.003022 0.074963 0.000000 0.007085 0.002399 0.003787 0.012624 0.004518 0.013119 0.008106 0.000000 0.006715 0.009186 0.003766 0.002557 0.009805 0.009215 0.009131 0.000000 0.000000
REG_CITY_NOT_LIVE_CITY 0.339547 0.000000 0.092256 0.045825 0.000000 0.046275 0.023564 0.014999 0.048395 0.999972 0.439962 0.029861 0.153122 0.000000 0.000000 0.000000 0.003458 0.056019 0.001801 0.005296 0.018503 0.000000 0.058091 0.000000 0.000000 0.004594 0.012980 0.000672 0.011521 0.000000 0.003654 0.000000 0.000000
REG_CITY_NOT_WORK_CITY 0.142506 0.000000 0.255917 0.121108 0.003522 0.051897 0.186336 0.003383 0.045366 0.439962 0.999989 0.825180 0.240299 0.000000 0.012870 0.000000 0.056723 0.033837 0.000000 0.000000 0.042682 0.000000 0.157524 0.000000 0.000000 0.002283 0.013347 0.000000 0.001868 0.000000 0.005424 0.000000 0.000951
LIVE_CITY_NOT_WORK_CITY 0.010829 0.000000 0.218957 0.110478 0.003211 0.032252 0.237686 0.003754 0.022972 0.029861 0.825180 0.999986 0.197453 0.000720 0.015206 0.000000 0.053971 0.000806 0.000000 0.005202 0.042419 0.000000 0.133376 0.000000 0.000000 0.000000 0.005987 0.000000 0.003546 0.000000 0.004516 0.002527 0.002813
REG_REGION_NOT_WORK_REGION 0.452122 0.000000 0.108618 0.068964 0.000000 0.006048 0.858512 0.029056 0.003957 0.153122 0.240299 0.197453 0.999958 0.000000 0.017619 0.000000 0.021723 0.058898 0.000000 0.019851 0.060633 0.000000 0.066960 0.000000 0.011145 0.000000 0.007072 0.000000 0.000000 0.003020 0.013681 0.000000 0.000000
FLAG_DOCUMENT_4 0.000000 0.000000 0.000000 0.003075 0.000000 0.000000 0.000000 0.000000 0.003022 0.000000 0.000000 0.000720 0.000000 0.972220 0.000000 0.000000 0.012711 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
FLAG_DOCUMENT_5 0.011142 0.000000 0.018527 0.036079 0.004577 0.000000 0.014603 0.000000 0.074963 0.000000 0.012870 0.015206 0.017619 0.000000 0.999862 0.000000 0.192882 0.007178 0.000000 0.007170 0.036578 0.000000 0.038148 0.000000 0.006805 0.001212 0.010531 0.000000 0.011577 0.003298 0.006119 0.000366 0.000000
FLAG_DOCUMENT_2 0.000000 0.000000 0.001192 0.000000 0.000000 0.005469 0.000000 0.001125 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.954543 0.009591 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
FLAG_DOCUMENT_3 0.033288 0.000000 0.248955 0.061083 0.006392 0.044135 0.011904 0.011470 0.007085 0.003458 0.056723 0.053971 0.021723 0.012711 0.192882 0.009591 0.999990 0.093042 0.007237 0.097963 0.466289 0.021690 0.486185 0.000000 0.020905 0.009128 0.008287 0.000000 0.032886 0.000000 0.000000 0.007662 0.022773
FLAG_DOCUMENT_11 0.105901 0.000000 0.029219 0.123001 0.000000 0.003865 0.005963 0.004342 0.002399 0.056019 0.033837 0.000806 0.058898 0.000000 0.007178 0.000000 0.093042 0.999479 0.000000 0.002743 0.017243 0.000000 0.018783 0.000000 0.002496 0.000000 0.004986 0.000000 0.005553 0.000000 0.001997 0.000000 0.000000
FLAG_DOCUMENT_10 0.001342 0.000000 0.000000 0.000598 0.000000 0.000000 0.000000 0.000000 0.003787 0.001801 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.007237 0.000000 0.928569 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
FLAG_DOCUMENT_9 0.017142 0.000000 0.023234 0.007848 0.008224 0.003826 0.015080 0.009699 0.012624 0.005296 0.000000 0.005202 0.019851 0.000000 0.007170 0.000000 0.097963 0.002743 0.000000 0.999478 0.018421 0.000000 0.019225 0.000000 0.000000 0.000000 0.000000 0.004882 0.007729 0.000000 0.000000 0.000000 0.000000
FLAG_DOCUMENT_8 0.023536 0.002335 0.122020 0.020747 0.021809 0.008008 0.061096 0.030388 0.004518 0.018503 0.042682 0.042419 0.060633 0.000000 0.036578 0.000000 0.466289 0.017243 0.000000 0.018421 0.999973 0.003075 0.092426 0.000000 0.078396 0.000000 0.006243 0.004945 0.012813 0.022567 0.031373 0.000671 0.002117
FLAG_DOCUMENT_7 0.000000 0.000000 0.000000 0.000000 0.001773 0.000000 0.000000 0.000000 0.013119 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.021690 0.000000 0.000000 0.000000 0.003075 0.989794 0.003308 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
FLAG_DOCUMENT_6 0.023963 0.000000 0.597988 0.138504 0.009241 0.028538 0.059169 0.041923 0.008106 0.058091 0.157524 0.133376 0.066960 0.000000 0.038148 0.000000 0.486185 0.018783 0.000000 0.019225 0.092426 0.003308 0.999975 0.000000 0.017395 0.004596 0.024457 0.002826 0.026379 0.009714 0.014107 0.005056 0.004459
FLAG_DOCUMENT_12 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.499995 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
FLAG_DOCUMENT_13 0.002580 0.000000 0.026118 0.000000 0.060600 0.011950 0.015899 0.002992 0.006715 0.000000 0.000000 0.000000 0.011145 0.000000 0.006805 0.000000 0.020905 0.002496 0.000000 0.000000 0.078396 0.000000 0.017395 0.000000 0.999429 0.000000 0.004689 0.000000 0.005238 0.000000 0.001730 0.033210 0.004520
FLAG_DOCUMENT_19 0.000000 0.000000 0.009385 0.011717 0.004141 0.000000 0.000000 0.000000 0.009186 0.004594 0.002283 0.000000 0.000000 0.000000 0.001212 0.000000 0.009128 0.000000 0.000000 0.000000 0.000000 0.000000 0.004596 0.000000 0.000000 0.996642 0.000000 0.000000 0.000000 0.000000 0.000000 0.039555 0.000000
FLAG_DOCUMENT_18 0.008837 0.010736 0.040904 0.030950 0.040618 0.007022 0.003540 0.008508 0.003766 0.012980 0.013347 0.005987 0.007072 0.000000 0.010531 0.000000 0.008287 0.004986 0.000000 0.000000 0.006243 0.000000 0.024457 0.000000 0.004689 0.000000 0.999753 0.000000 0.008647 0.001622 0.004138 0.086002 0.001231
FLAG_DOCUMENT_17 0.000000 0.000000 0.005971 0.000000 0.014093 0.002479 0.000000 0.000000 0.002557 0.000672 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.004882 0.004945 0.000000 0.002826 0.000000 0.000000 0.000000 0.000000 0.991665 0.000000 0.000000 0.000000 0.028338 0.000000
FLAG_DOCUMENT_16 0.005940 0.000000 0.042980 0.004875 0.030104 0.010595 0.003389 0.012151 0.009805 0.011521 0.001868 0.003546 0.000000 0.000000 0.011577 0.000000 0.032886 0.005553 0.000000 0.007729 0.012813 0.000000 0.026379 0.000000 0.005238 0.000000 0.008647 0.000000 0.999791 0.002112 0.004655 0.080686 0.000000
FLAG_DOCUMENT_15 0.000000 0.000000 0.014717 0.008916 0.012868 0.005639 0.004232 0.002711 0.009215 0.000000 0.000000 0.000000 0.003020 0.000000 0.003298 0.000000 0.000000 0.000000 0.000000 0.000000 0.022567 0.000000 0.009714 0.000000 0.000000 0.000000 0.001622 0.000000 0.002112 0.998359 0.000000 0.027209 0.000000
FLAG_DOCUMENT_14 0.003106 0.000000 0.023547 0.001766 0.075213 0.008764 0.013542 0.000000 0.009131 0.003654 0.005424 0.004516 0.013681 0.000000 0.006119 0.000000 0.000000 0.001997 0.000000 0.000000 0.031373 0.000000 0.014107 0.000000 0.001730 0.000000 0.004138 0.000000 0.004655 0.000000 0.999319 0.023345 0.000000
FLAG_DOCUMENT_20 0.000610 0.000000 0.009698 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.002527 0.000000 0.000000 0.000366 0.000000 0.007662 0.000000 0.000000 0.000000 0.000671 0.000000 0.005056 0.000000 0.033210 0.039555 0.086002 0.028338 0.080686 0.027209 0.023345 0.996030 0.004427
FLAG_DOCUMENT_21 0.001759 0.000000 0.007439 0.000000 0.007145 0.001196 0.000000 0.000000 0.000000 0.000000 0.000951 0.002813 0.000000 0.000000 0.000000 0.000000 0.022773 0.000000 0.000000 0.000000 0.002117 0.000000 0.004459 0.000000 0.004520 0.000000 0.001231 0.000000 0.000000 0.000000 0.000000 0.004427 0.993053
In [32]:
plt.figure(figsize=(30,15))
sns.heatmap(corr_bool, annot=True, fmt='.3f', cmap='YlGnBu')
plt.title('Cramers V Matrix', fontdict={'size':'17'})
plt.show()
No description has been provided for this image

Si bien no se observan correlaciones muy altas de las diferentes variables categóricas y booleanas con nuestra variable target, la variable que tiene la correlación más alta es OCCUPATION_TYPE, que comentamos anteriormente en en análisis gráfico. Esta variable presenta una correlación del 8%, aunque no es mucho si que podría tener importancia en el modelo.

Destacar correlaciones entre el 30% y el 70% entre variables como pueden ser el tipo de vivienda y sus materiales de construcción, además de las características de las viviendas. Esta alta relación no es preocupante ya que se trata de una relación lógica.

También observar una correlación del 42.3% entre el nombre del puesto de trabajo que ocupa el cliente y el tipo de empresa en la que trabaja. A priori también una relación normal y no preocupante.

Weight of Evidence (WoE) e Information Value (IV)¶

El WoE es una medida que transforma una variable categórica o continua en una escala que refleja la relación entre las probabilidades de los dos grupos de la variable dependiente (por ejemplo, "fraude" y "no fraude"). Se calcula de la siguiente manera:

$$ WoE = ln (Distribución de la clase positiva/Distribución de la clase negativa) $$

Interpretación:

  • Si WoE > 0, la categoría tiene una mayor proporción de positivos (indicando un buen predictor para la clase positiva).
  • Si WoE < 0, la categoría tiene una mayor proporción de negativos (indicando un buen predictor para la clase negativa).
  • WoE = 0 indica que la categoría tiene una distribución balanceada entre positivos y negativos, lo que no aporta mucha información.

¶

El Information Value (IV) es una métrica que ayuda a cuantificar la capacidad predictiva de una variable con respecto a la variable objetivo (target). Es una medida acumulada de las diferencias entre las proporciones de positivos y negativos en cada grupo.

El IV se calcula sumando los valores de WoE ponderados por la diferencia entre las proporciones de positivos y negativos en cada grupo:

$$ IV = ∑(Proporción de la clase positiva − Proporción de la clase negativa) × WoE $$

Interpretación del IV:

  • IV < 0.02: Baja capacidad predictiva.
  • 0.02 < IV < 0.1: Capacidad predictiva débil.
  • 0.1 < IV < 0.3: Capacidad predictiva moderada.
  • 0.3 < IV < 0.5: Alta capacidad predictiva.
  • IV > 0.5: Muy alta capacidad predictiva (aunque se debe tener precaución de no sobreajustar el modelo).

A continuación vamos a calcular el WOE y el IV para algunas variables categóricas que me parecen interesantes. De las que posteriormente comentaremos las conclusiones.

In [33]:
woe_dict, iv = f_aux.calculate_woe_iv_categorical(df=df_loan_train, variable='OCCUPATION_TYPE', target='TARGET')

print("WoE por categoría:", woe_dict)
print("IV de la variable:", iv)
WoE por categoría: {'Accountants': np.float64(-0.5400057046334321), 'Cleaning staff': np.float64(0.20125383936768831), 'Cooking staff': np.float64(0.317780773417085), 'Core staff': np.float64(-0.2541654427300936), 'Drivers': np.float64(0.3876294694822735), 'HR staff': np.float64(-0.3778470566427261), 'High skill tech staff': np.float64(-0.29303383912994113), 'IT staff': np.float64(-0.11435561393858834), 'Laborers': np.float64(0.29302322513685075), 'Low-skill Laborers': np.float64(0.8619870550944092), 'Managers': np.float64(-0.3019227627931952), 'Medicine staff': np.float64(-0.2105095609540891), 'Private service staff': np.float64(-0.2755682075223078), 'Realty agents': np.float64(-0.039935644637323194), 'Sales staff': np.float64(0.20619282273186823), 'Secretaries': np.float64(-0.14368624748642267), 'Security staff': np.float64(0.33001694046914937), 'Waiters/barmen staff': np.float64(0.33623333358368496), 'Desconocido': np.float64(-0.24043809051046666)}
IV de la variable: 0.08587967416283065
In [34]:
woe_dict, iv = f_aux.calculate_woe_iv_categorical(df=df_loan_train, variable='NAME_INCOME_TYPE', target='TARGET')

print("WoE por categoría:", woe_dict)
print("IV de la variable:", iv)
WoE por categoría: {'Businessman': 0, 'Commercial associate': np.float64(-0.08014465465518467), 'Maternity leave': np.float64(2.4324819935799025), 'Pensioner': np.float64(-0.43494908113169145), 'State servant': np.float64(-0.35974790110068705), 'Student': 0, 'Unemployed': np.float64(2.027016885471738), 'Working': np.float64(0.1878468975898912), 'Desconocido': 0}
IV de la variable: 0.05808599223176106
In [35]:
woe_dict, iv = f_aux.calculate_woe_iv_categorical(df=df_loan_train, variable='NAME_EDUCATION_TYPE', target='TARGET')

print("WoE por categoría:", woe_dict)
print("IV de la variable:", iv)
WoE por categoría: {'Academic degree': np.float64(-2.4503199290064686), 'Higher education': np.float64(-0.4393091653969691), 'Incomplete higher': np.float64(0.05657720465626268), 'Lower secondary': np.float64(0.3385501370579372), 'Secondary / secondary special': np.float64(0.11163766773089984), 'Desconocido': 0}
IV de la variable: 0.05154040418506241
In [36]:
woe_dict, iv = f_aux.calculate_woe_iv_categorical(df=df_loan_train, variable='CODE_GENDER', target='TARGET')

print("WoE por categoría:", woe_dict)
print("IV de la variable:", iv)
WoE por categoría: {'F': np.float64(-0.1579356962666567), 'M': np.float64(0.2556181541980332), 'XNA': 0, 'Desconocido': 0}
IV de la variable: 0.040237003552605975

Voy a comentar mis conclusiones de las 4 variables analizadas:

  • En la variable 'OCCUPATION_TYPE' se observa como en trabajos menos cualificados el coeficiente WoE es positivo, es decir, cuanto mayor sea el coeficiente, mayor proporción de 1 en TARGET tendrán este tipo de trabajos. Por tanto, los clientes con trabajos poco cualificados como 'low-skill laborers', 'Drivers', 'Security Staff' o 'Waiters' muestran mayor proporción de 1 en TARGET (dificultad de pago). A su vez, clientes con trabajos más cualificados tienen coeficientes negativos, que supone que la categoría tiene una mayor proporción de clientes con TARGET = 0.

  • En la variable 'NAME_INCOME_TYPE' observamos como 'Unemployed' y 'Maternity leave' tienen un gran coeficiente positivo, por lo que son buenos predictores para TARGET = 1 (dificultad de pago). Por otro lado, 'Pensioner' y 'State servant' tienen coeficientes negativos, que supone que la categoría tiene una mayor proporción de clientes con TARGET = 0. 'Businessman' tiene un valor de 0, lo que significa que la categoría tiene una distribución balanceada entre positivos y negativos

  • En la variable 'EDUCATION_TYPE' los clientes con mejor educación tienen coeficientes negativos y los clientes de menor educación tienen coeficientes positivos. En principio, es algo lógico.

  • La variable 'CODE_GENDER' me parece interesante, pues los hombres 'M' tienen mayor coeficiente que las mujeres 'F', por tanto, a priori la mayoría de la proporción de TARGET = 1 (dificultad de pago) se corresponde a clientes varones.

¶

Como conclusión acerca del IV, observamos que todos los valores se encuentran en el intervalo 0.02 < IV < 0.1, por tanto, las variables presentan una capacidad predictiva débil. Esto ocurre ya que es necesario combinar varias variables para forjar una capacidad predictiva fuerte, si una única variable tuviera mucho poder predictivo sobre la variable objetivo podría generar problemas de multicolinealidad, overfitting o sesgo.

Exportación de datasets¶

In [37]:
print(df_loan_train.shape, df_loan_test.shape)
(246008, 122) (61503, 122)
In [38]:
df_loan_train.to_csv('../../data_loan_status/data_split/df_loan_train.csv', index=False) 
df_loan_test.to_csv('../../data_loan_status/data_split/df_loan_test.csv', index=False)

Conclusiones EDA¶

Como hipótesis inicial y respondiendo a la pregunta planteada para la práctica ¿Hay algún tipo de clientes más propenso a no devolver un préstamo? Según nuestro análisis exploratorio de los datos podríamos deducir que tipo de cliente sería más propenso a no devolver un préstamo. Destacar que este perfilado de clientes es una hipótesis propia realizada bajo mi criterio según los valores estadísticos visualizados en el EDA, que podremos contrastar cuando realicemos el Feature engineering y el modelado. En esa parte de la práctica volveremos a comentar si rechazamos o no rechazamos la hipótesis nula aqui planteada.

Según el análisis exploratorio de los datos realizados en los 2 primeros notebooks, podemos intuir que el tipo de cliente que tendrá dificultades a la hora de pagar o devolver el préstamo de manera completa será:

  • Un cliente con una baja educación
  • Que tenga un coche antiguo
  • Un trabajo cualificadamente bajo
  • Que tenga una vivienda construida con materiales pobres, especialmente madera.
  • Una familia grande con mas de 2 hijos
  • Que esté desempleado o de baja

Posteriormente en la realización del feature engineering y del modelado verificaremos si la hipótesis inicial planteada según mi criterio bajo la interpretación de los estadísticos realizados y visualizados se cumple.

En la realización de este análisis exploratorio de los datos hemos aprendido:

  1. Entendimiento profundo de nuestros datos y de la problemática de negocio.
  2. La importación de nuestros datos, dimensiones de los mismos, división y reconocimiento de las diferentes categorías aportando una visualización de las mismas.
  3. Detección, graficado y análisis de nuestra variable objetivo. Concluyendo que presentaba un claro desbalanceo.
  4. Separación de nuestro dataset en train y test de manera estratificada debido al desbalanceo de nuestra variable objetivo.
  5. Visualización descriptiva de nuestras variables, pudiendo comprender su naturaleza, distribución e importancia en la variable objetivo.
  6. Tratamiento de valores atípicos (outliers), comprendiendo la importancia de los mismos y la repercusión que pudieran tener en la fase de modelado.
  7. Tratamiento de valores nulos, en todas las categorías de los datos (numéricos, booleanos y categóricos), aprendiendo y reflexionando sobre las diferentes métricas de imputación de valores nulos. Observando como afectan a la distribución y a la descripción estadística de nuestras variables.
  8. Análisis de correlación de las variables, pudiendo comprender como afecta una alta correlación en nuestra variable objetivo.

Todo esto nos permitió comprender que trabajamos con un Dataset que contiene muchas variables de diferentes tipos, con las cuales buscamos explicar y predecir el comportamiento de nuestra variable objetivo, es decir, cuando un cliente puede llegar a tener dificultades en el pago de un préstamo.

Con estas conclusiones, tenemos un problema complejo por delante que supondrá un gran reto desde el punto de vista del éxito de nuestros modelos, debido a que el modelo más simple de todos sería decir que pocos clientes tendrían dificultades en el pago del préstamo, ya que sólo tendríamos error en el 8.07% de las veces. El objetivo será intentar mejorar ese porcentaje agregando complejidad a nuestro análisis.

Cosas a tener en cuenta a la hora de ejecutar modelos:¶

  • Podría ser necesario balancear el modelo, con técnicas de oversampling
  • Hay variables que identificamos como importantes para predecir la dificultad de pago, como OCCUPATION_TYPE (puesto de trabajo), NAME_EDUCATION_TYPE (tipo de educación), NAME_INCOME_TYPE (pensionista, estudiante, trabajador), CNT_CHILDREN (tamaño de la familia), entre otras.
  • Posibilidad de realizar un Mean Encoding en vez de One-Hot encoding para variables categóricas que presenten muchas categorías.